Methods and compositions for categorizing patients

ABSTRACT

The disclosure provides, among other things, molecular markers for categorizing the neoplastic state of a patient, methods for using the molecular markers in diagnostic tests, nucleic acid and amino acid sequences related to the molecular markers, reagents for detection of molecular markers, and methods for identifying candidate molecular markers in highly parallel gene expression data.

FUNDING

Work described herein was funded, in part, by grant number 1 U01CA-88130-01 from the National Cancer Institute. The United Statesgovernment has certain rights in the invention.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.12/386,176, filed on Apr. 13, 2009, which is a continuation of U.S.patent application Ser. No. 10/649,591 (abandoned), filed Aug. 26, 2003which is a continuation-in-part of U.S. patent application Ser. No.10/274,177 (now U.S. Pat. No. 7,118,912), filed Oct. 18, 2002, which isa continuation-in-part of U.S. patent application Ser. No. 10/229,345(now U.S. Pat. No. 7,081,516), filed Aug. 26, 2002. The disclosures ofeach of the foregoing applications are hereby incorporated by referencein their entirety.

BACKGROUND

Colorectal cancer, also referred to herein as colon cancer, is thesecond leading cause of cancer mortality in the adult Americanpopulation. An estimated 135,000 new cases of colon cancer occur eachyear. Although many people die of colon cancer, early stage coloncancers are often treatable by surgical removal (resection) of theaffected tissue. Surgical treatment can be combined withchemotherapeutic agents to achieve an even higher survival rate incertain colon cancers. However, the survival rate drops to 5% or lessover five years in patients with metastatic (late stage) colon cancer.

Effective screening and early identification of affected patientscoupled with appropriate therapeutic intervention is proven to reducethe number of colon cancer mortalities. It is estimated that 74,000,000older Americans would benefit from regular screening for colon cancerand precancerous colon adenomas (together, adenomas and colon cancersmay be referred to as colon neoplasias). However, present systems forscreening for colon neoplasia are inadequate. For example, the FecalOccult Blood Test involves testing a stool sample from a patient thr thepresence of blood. This test is relatively simple and inexpensive, butit often fails to detect colon neoplasia (low sensitivity) and ofteneven when blood is detected in the stool, a colon neoplasia is notpresent (low specificity). Flexible sigmoidoscopy involves the insertionof a short scope into the rectum to visually inspect the lower third ofthe colon. Because the sigmoidoscope is relatively short, it is also arelatively uncomplicated diagnostic method. However, nearly half of allcolon neoplasia occurs in the upper portions of the colon that can notbe viewed with the sigmoidoscope. Colonoscopy, in which a scope isthreaded through the entire length of the colon, provides a veryreliable method of detecting colon neoplasia in a subject, butcolonoscopy is costly, time consuming and requires sedation of thepatient.

Modern molecular biology has made it possible to identify proteins andnucleic acids that are specifically associated with certainphysiological states. These molecular markers have revolutionizeddiagnostics for a variety of health conditions ranging from pregnancy toviral infections, such as HIV.

Researchers generally identify molecular markers for a health conditionby searching for genes and proteins that are expressed at differentlevels in one health condition versus another (e.g. in pregnant womenversus women who are not pregnant). Traditional methods for pursuingthis research, such as Northern blots and reverse transcriptasepolymerase chain reaction, allow a researcher to study only a handful ofpotential molecular markers at a time. Microarrays, consisting of anordered array of hundreds or thousands of probes for detection ofhundreds or thousands of gene transcripts, allow researchers to gatherdata on many potential molecular markers in a single experiment.Researchers now face the challenge of sifting through large quantitiesof microarray-generated gene expression data to identify genes that maybe of genuine use as molecular markers to distinguish different healthconditions.

Improved systems for identifying high quality candidate molecularmarkers in large volumes of gene expression data may help to unlock thepower of such tools and increase the likelihood of identifying amolecular marker for important disease states, such as colon neoplasia.Effective molecular markers for colon neoplasia could potentiallyrevolutionize the diagnosis, management and overall health impact ofcolon cancer.

BRIEF SUMMARY

This application is based at least in part on the selection of usefulmolecular markers of colon neoplasia. Colon neoplasia is a multi-stageprocess involving progression from normal healthy tissues to thedevelopment of pre-cancerous colon adenomas to more invasive stages ofcolon cancer such as the Dukes A and Dukes B stages and finally tometastatic stages such as Dukes C and Dukes D stages of colon cancer.

In one aspect, this application provides molecular markers that areuseful in the detection or diagnosis of colon neoplasia. In certainembodiments, molecular markers described in the application are helpfulin distinguishing normal subjects from those who are likely to developcolon neoplasia or are likely to harbor a colon adenoma. In otheraspects the invention provides molecular markers that may be useful indistinguishing subjects who are either normal or precancerous from thosewho have colon cancer. In another embodiment, the application providesmarkers that help in staging the colon cancer in patients. In stillother embodiments the application contemplates the use of one or more ofthe molecular markers described herein for the detection, diagnosis, andstaging of colon neoplasias.

In one aspect the application provides a method of screening a subjectfor a condition associated with increased levels of one or moremolecular markers that are indicative of colon neoplasia such as forexample ColoUp1-ColoUp8 and osteopontin. In a preferred embodiment, theapplication provides a method for screening a subject for conditionsassociated with secreted markers such as ColoUp1 or ColoUp2, bydetecting in a biological sample an amount of ColoUp1 or ColoUp2 andcomparing the amount of ColoUp1 and ColoUp2 found in the subject to oneor more of the following: a predetermined standard, the amount ofColoUp1 or ColoUp2 detected in a normal sample from the subject, thesubject's historical baseline level of ColoUp1 or ColoUp2, or theColoUp1 or ColoUp2 level detected in a different, normal subject (acontrol subject). Detection of a level of ColoUp1 and ColoUp2 in thesubject that is greater than that of the predetermined standard or thatis increased from a subject's past baseline is indicative of a conditionsuch as colon neoplasia. In certain aspects, an increase in the amountof ColoUp1 or ColoUp2 as compared to the subject's historical baselinewould be indicative of a new neoplasm, or progression of an existingneoplasm. Similarly, a decrease in the amount of ColoUp1 or ColoUp2 ascompared to the subject's historical baseline would be indicative ofregression on an existing neoplasm

In one aspect the molecular markers described herein are encoded by anucleic acid sequence that is at least 90%, 95%, 98%, 99%, 99.3%, 99.5%or 99.7% identical to the nucleic acid sequence of SEQ ID Nos: 4-12, andmore preferably to the nucleic acid sequences as set forth in SEQ IDNos: 4-5. In another aspect, the application provides markers that areencoded by a nucleic acid sequence that hybridizes under high stringencyconditions to the nucleic acid sequences of SEQ ID Nos: 4-12, morepreferably to the nucleic acid sequences as set forth in SEQ ID Nos:4-5.

In another aspect the application provides molecular markers that arediagnostic of colon neoplasia, said markers having an amino acidsequence that is at least 90%, 95%, 98%, 99%, 99.3%, 99.5% or 99.7%identical to the amino acid sequence as set forth in SEQ ID Nos: 1-3 or13-20, more preferably the amino acid sequence as set forth in SEQ IDNos: 3 and 14.

In one aspect, the application provides methods for detecting secretedpolypeptide forms of a ColoUp1-ColoUp8 polypeptide or osteopontin inbiological samples. In other aspects, the application provides methodsfor imaging a colon neoplasm by targeting antibodies to any one of themarkers ColoUp1 through ColoUp8 described herein, and in preferredembodiments, the antibodies are targeted to ColoUp3. In certain aspects,the application provides methods for administering a imaging agentcomprising a targeting moiety and an active moiety. The targeting moietymay be an antibody, Fab, F(Ab)2, a single chain antibody or otherbinding agent that interacts with an epitope specified by a polypeptidesequence having an amino acid sequence as set forth in SEQ ID Nos: 1-3and 13-20. The active moiety may be a radioactive agent, such asradioactive technetium, radioactive indium, or radioactive iodine. Theimaging agent is administered in an amount effective for diagnostic usein a mammal such as a human and the localization and accumulation of theimaging agent is then detected. The localization and accumulation of theimaging agent may be detected by radioscintigraphy, nuclear magneticresonance imaging, computed tomography or positron emission tomography.

In a preferred embodiment, the application provides methods fordetecting a polypeptide comprising an amino acid sequence as set forthin one of SEQ ID Nos: 1-3. As will be apparent to the skilled artisan,the molecular markers described herein may be detected in a number ofways such as by various assays, including antibody-based assays.Examples of antibody-based assays include immunoprecipitation assays,Western blots, radioimmunoassays or enzyme-linked immunosorbent assays(ELISAs). Molecular markers described herein may be detected by assaysthat do not employ an antibody, such as by methods employingtwo-dimensional gel electrophoresis, methods employing massspectroscopy, methods employing suitable enzymatic activity assays, etc.In a preferred embodiment the application provides methods for thedetection of secreted markers such as ColoUp1 or ColoUp2 polypeptides inblood, blood fractions (such as blood serum or blood plasma), urine orstool samples. Increased levels of these markers may be associated witha number of conditions such as for example colon neoplasia, includingcolon adenomas, colon cancer, and metastatic colon cancer. In certain,aspects the application provides methods including the detection of morethan one marker that is indicative of colon neoplasia such as methodsfor detecting both ColoUp1 and ColoUp2. In yet another aspect,combinations of the ColoUp markers may be useful, for instance, acombination of tests including testing biological samples for secretedmarkers such as ColoUp1 or ColoUp2 in combination with testing fortransmembrane markers such as ColoUp3 as targets for imaging agents.

In yet another aspect, the application provides a method of determiningwhether a subject is likely to develop colon cancer or is more likely toharbor a precancerous colon adenoma by detecting the presence or absenceof the molecular markers as set forth in SEQ ID Nos: 1-3. Detection ofcombinations of these markers is also helpful in staging the colonneoplasias.

In yet another aspect, the application provides markers that are usefulin distinguishing normal and precancerous subjects from those subjectshaving colon cancer. In certain embodiments, the applicationcontemplates determining the levels of markers provided herein such asColoUp1 through ColoUp8 and osteopontin. In one aspect, markers such asColoUp6 and osteopontin are helpful in distinguishing between thecategory of patients that are normal or have precancerous colon adenomasand the category of patients having colon cancer. In another aspect, theapplication provides detection of one or more of said markers indetermining the stages of colon neoplasia.

In certain aspect, the invention provides an immunoassay for determiningthe presence of any one of the polypeptides having an amino acidsequence as set forth in SEQ ID Nos: 1-3 and 13-20, more preferably anyone of the polypeptides having an amino acid sequence as set forth inSEQ ID Nos: 1-3 in a biological sample. The method includes obtaining abiological sample and contacting the sample with an antibody specificfor a polypeptide having an amino acid sequence as set forth in SEQ IDNos: 1-3 and detecting the binding of the antibody.

In some aspects, the application provides methods for the detection of amolecular marker in a biological sample such as blood, including bloodfractions such as serum or plasma. For instance, the blood sampleobtained from a patient may be further processed such as byfractionation to obtain blood serum, and the serum may then be enrichedfor certain polypeptides. The serum so enriched is then contacted withan antibody that is reactive with an epitope of the desired markerpolypeptide.

In yet another embodiment, the application provides methods fordetermining the appropriate therapeutic protocol for a subject. Forexample detection of a colon neoplasia provides the treating physicianvaluable information in determining whether intensive or invasiveprotocols such as colonoscopy, surgery or chemotherapy would be neededfor effective diagnosis or treatment. Such detection would be helpfulnot only for patients not previously diagnosed with colon neoplasia butalso in those cases where a patient has previously received or iscurrently receiving therapy for colon cancer, the presence or absence ora change in the level of the molecular markers set forth herein may beindicative that the subject is likely to have a relapse or aprogressive, or a persistent colon cancer.

In certain aspects, the application provides molecular markers of colonneoplasia such as ColoUp1 through ColoUp8. In certain instances thesemarkers are secreted proteins such as ColoUp1, ColoUp2 and osteopontin,and are useful for detecting and diagnosing colon neoplasia in otheraspects, these markers may be transmembrane proteins such as ColoUp3 andmay be useful as targets for imaging agents, e.g. as targets to labelcells of a neoplasm.

In one aspect, the application provides isolated, purified orrecombinant polypeptides having an amino acid sequence that is at least90%, 95% or 98-99% identical to an amino acid sequence as set forth inSEQ ID Nos: 1-3 or an amino acid sequence as set forth in SEQ ID Nos:13-20. In a more preferred embodiment, the application provides an aminoacid sequence that is at least 90%, 95%, 98-99%, 99.3%, 99.5% or 99.7%identical to the amino acid sequence as set forth in SEQ ID No: 3 or SEQID No: 14. The application also provides fusion proteins comprising theColoUp proteins described herein fused to a heterologous protein. Incertain embodiments, such polypeptides are useful, for example, forgenerating antibodies or for use in screening assays to identifycandidate therapeutics.

In other aspects the application provides for nucleic acid sequencesencoding the polypeptides as set forth in SEQ ID Nos: 1-3 and 13-20. Inone aspect the application provides nucleic acids comprising nucleicacid sequences that are at least 90%, 95%, 98-99%, 99.3%, 99.5% or 99.7%identical to the nucleic acid sequence in SEQ ID Nos: 4-12, morepreferably 4-5. Also contemplated herein are vectors comprising thenucleic acid sequences set forth in SEQ ID Nos: 4-12, more preferablySEQ ID Nos: 4-5, and host cells expressing the nucleic acid sequences.

In another aspect, the application provides an antibody that interactswith an epitope specified by one of SEQ ID Nos: 1-3 and 13-20 orportions thereof, more preferably SEQ ID Nos: 1-3 or portions thereof.In a preferred embodiment the antibody is useful for detecting colonadenomas and interacts with an epitope specified by one of SEQ ID Nos:1-3. In certain aspects the application provides for generating suchantibodies, including methods for generating monoclonal and polyclonalantibodies, as well as methods for generating other types of antibodies.In other aspects, the application also provides a hybridoma cell linecapable of producing an antibody that interacts with an epitopespecified by SEQ ID Nos: 1-3 and 13-20, more preferably SEQ ID Nos: 1-3,or portions thereof. In yet other embodiments, the antibody may be asingle chain antibody.

In yet other embodiments, the application provides a kit for detectingcolon neoplasia in a biological sample, Such kits include one or moreantibodies that are capable of interacting with an epitope specified byone of SEQ ID Nos: 1-3 and 13-20, more preferably with an epitopespecified by one of SEQ ID Nos: 1-3. In more preferred embodiments, theantibodies may be detectably labeled, such as for example with anenzyme, a fluorescent substance, a chemiluminescent substance, aschromophore, radioactive isotope or a complexing agent.

In certain embodiments, the application provides the identity of ColoUp1and ColoUp2 polypeptides that are secreted into the serum in vivo, andthat are secreted across the apical and basolateral cell surfaces incultured intestinal cells. Accordingly, in certain embodiments, theapplication provides methods for detecting whether a subject to likelyto have a colon neoplasia comprising: a) obtaining a biological samplefrom said subject; and b) detecting one or more polypeptides selectedfrom among: one or more secreted ColoUp1 polypeptides and one or moresecreted ColoUp2 polypeptides, wherein the presence of said one or morepolypeptides is indicative of colon neoplasia.

In certain embodiments, a secreted ColoUp2 polypeptide is selected fromamong: a) a secreted polypeptide produced by the expression of a nucleicacid that is at least 95% identical to the amino acid sequence of SEQ IDNo: 5; b) a secreted polypeptide produced by the expression of a nucleicacid that is a naturally occurring variant of SEQ ID No: 5; c) asecreted polypeptide produced by the expression of a nucleic acid thathybridizes under stringent conditions to a nucleic acid sequence of SEQID No: 5; d) a secreted polypeptide having a sequence that is at least95% identical to the amino acid sequence of SEQ ID No: 3; and e) asecreted polypeptide having a sequence that is at least 95% identical tothe amino acid sequence of SEQ ID No: 21. Optionally, the secretedColoUp2 polypeptide is produced by the expression of a nucleic acidhaving the sequence of SEQ ID No: 5, and preferably the secreted ColoUp2polypeptide is produced by the expression of a nucleic acid sequencethat is at least 98%, 99% or 100% identical to the nucleic acid sequenceof SEQ ID No: 5. In certain embodiments, the secreted ColoUp2polypeptide has an amino acid sequence that is at least 98%, 99% or 100%identical to an amino acid sequence selected from among SEQ ID No: 3 andSEQ ID No:21. In certain embodiments, the secreted ColoUp1 polypeptideis selected from among: a) a secreted polypeptide produced by theexpression of a nucleic acid that is at least 95% identical to the aminoacid sequence of SEQ ID No: 4; b) a secreted polypeptide produced by theexpression of a nucleic acid that is a naturally occurring variant ofSEQ ID No: 4; c) a secreted polypeptide produced by the expression of anucleic acid that hybridizes under stringent conditions to a nucleicacid sequence of SEQ ID No: 4; d) a secreted polypeptide having asequence that is at least 95% identical to the amino acid sequence ofSEQ ID No: 1; and e) a secreted polypeptide having a sequence that is atleast 95% identical to the amino acid sequence of SEQ ID No: 2.Optionally, the secreted ColoUp1 polypeptide is produced by theexpression of a nucleic acid having a sequence that is at least 95%, 98,99% or 100% identical to the nucleic acid sequence of SEQ ID No: 4.Preferably, the secreted ColoUp1 polypeptide has an amino acid sequencethat is at least 95%, 98%, 99% or 100% identical to an amino acidsequence selected from among SEQ ID No: 1 and SEQ ID No:2. Optionally,for detection of basolaterally secreted ColoUp1 or ColoUp2 polypeptides,the biological sample is a blood sample or a fraction derived fromblood, such as serum, plasma, cells, or a fraction enriched for apicallysecreted ColoUp1 or ColoUp2 polypeptide. Optionally, for detection ofbasolaterally secreted ColoUp1 or ColoUp2 polypeptides, the biologicalsample is a urine sample or a fraction derived from urine. Optionally,for detection of apically secreted ColoUp1 or ColoUp2 polypeptides, thebiological sample is derived from the inner wall and/or lumen of theintestinal tract, such as intestinal mucous or other fluid, excretedstool and stool removed from within the colon. In certain embodiments,the polypeptide is detected by an assay that employs an antibody, suchas an immunoprecipitation assay, a Western blot, a radioimmunoassays oran enzyme-linked immunosorbent assay (ELISA). Optionally, an assaycomprises contacting the biological sample with an antibody thatinteracts with a secreted ColoUp1 polypeptide or a secreted ColoUp2polypeptide. An antibody may, for example, interact with an epitope ofan amino acid sequence selected from among: SEQ ID No: 1 and SEQ ID No:2. An antibody may, for example, interact with an epitope of an aminoacid sequence selected from among: SEQ ID No: 3 and SEQ ID No: 21.Optionally, the antibody is detectably labeled, such as with an enzyme,a fluorescent substance, a chemiluminescent substance, a chromophore, aradioactive isotope or a complexing agent. Optionally, the amount of atleast one secreted ColoUp1 polypeptide and/or at least one secretedColoUp2 polypeptide in the biological sample is compared to apredetermined standard (e.g., a known amount of purified ColoUp1 orColoUp2 polypeptide). Optionally, the amount of at least one secretedColoUp1 polypeptide and/or at least one secreted ColoUp2 polypeptide inthe biological sample is compared to the subject's historical baseline.In certain embodiments, the presence of at least one secreted ColoUp1polypeptide and/or at least one secreted ColoUp2 polypeptide isindicative that the subject is likely to harbor a colon adenoma or acolon cancer. In certain embodiments, the presence of at least onesecreted ColoUp1 polypeptide and/or at least one secreted ColoUp2polypeptide may be used in determining the therapeutic protocol to beadministered to a subject having a colon neoplasia, and the subject maynot have been previously diagnosed with colon cancer or the subject mayhave previously received or is currently receiving a therapy for coloncancer, wherein the presence of at least one secreted ColoUp1polypeptide and/or at least one secreted ColoUp2 polypeptide indicatesthat the subject is likely to have a relapse or a persistent orprogressive colon cancer. The detection of said secreted polypeptide mayindicate the presence of a variety of neoplasias in a subject, such as acolon adenoma, a colon cancer and a metastatic colon cancer. Optionally,a method involves detecting both at least one secreted ColoUp1polypeptide and at least one secreted ColoUp2 polypeptide in thebiological sample.

In certain embodiments, the application provides kits for detecting oneor more molecular markers of colon neoplasia in a biological sample. Akit may comprise a) an antibody which interacts with an epitope of asecreted ColoUp1 polypeptide or a secreted ColoUp2 polypeptide; and b)instructions for use. Optionally, the antibody interacts with an epitopeof a polypeptide selected from among: the polypeptide of SEQ ID No: 1,the polypeptide of SEQ ID No:2, the polypeptide of SEQ ID No:3 and thepolypeptide of SEQ ID No:21. Optionally, the antibody is detectablylabeled.

In certain embodiments, the application provides a novel purifiedpolypeptide, which is a portion of ColoUp2 that is found in serum. Sucha polypeptide may consist essentially of an amino acid sequence that isat least 95%, 98%, 99% or 100% identical to the sequence of SEQ ID No:21. By “consisting essentially” is meant that there may be, in additionto the indicated amino acid sequence, a variety of modifications, suchas phosphorylations, glycosylations, disulfide bonds, unusual ormodified amino acids, etc.

In certain embodiments, the application provides novel fusion proteinscomprising a first polypeptide domain and a second polypeptide domain,wherein the first polypeptide domain consists essentially of an aminoacid sequence that is at least 95%, 98%, 99% or 100% identical to anamino acid sequence of SEQ ID No. 21. The second polypeptide domain maybe a domain selected from the group consisting of: a detection domain, apurification domain and an antigenic domain.

In certain embodiments, the application provides antibodies that bindspecifically to a ColoUp2 polypeptide consisting essentially of theamino acid sequence of SEQ ID No: 21. The antibody may binds the ColoUp2polypeptide with a dissociation constant of less than 10⁻⁶ M, 10⁻⁷ M,10⁻⁸ M or 10⁻⁹ M. The antibody may be essentially any type of antibody,including polyclonal, monoclonal, and single chain antibodies, or otherfragments. For diagnostic use, there may be little benefit to having ahumanized antibody, however, humanized antibodies are highly desirablefor therapeutic uses. Preferably, a diagnostic antibody is effective fordetecting the ColoUp2 polypeptide in a biological sample, such as ablood, stool or urine sample, or a fraction thereof. Optionally, theantibody is effective for detecting the ColoUp2 polypeptide in a samplecomprising cells from a colon neoplasia. The application furtherprovides methods for making such antibodies in a variety of ways. Forexample, a monoclonal antibody may be produced in a method comprising:(a) administering to a mouse an amount of an immunogenic compositioncomprising the ColoUp2 polypeptide effective to stimulate a detectableimmune response; (b) obtaining antibody-producing cells from the mouseand fusing the antibody-producing cells with myeloma cells to obtainantibody-producing hybridomas; (c) testing the antibody-producinghybridomas to identify a preferred hybridoma, wherein the preferredhybridoma is a hybridoma that produces a monocolonal antibody that bindsspecifically to the ColoUp2 polypeptide; (d) culturing the preferredhybridoma cell culture that produces the monoclonal antibody that bindsspecifically to the ColoUp2 polypeptide; and (e) obtaining themonoclonal antibody that binds specifically to the ColoUp2 polypeptidefrom the cell culture. Optionally, the antibody-producing hybridomascomprises testing whether the antibody-producing hybridomas produce anantibody that binds to the ColoUp2 polypeptide in an assay selectedfront the group consisting of; an enzyme-linked immunosorbent assay, aBia-core assay and an immunoprecipitation assay.

The embodiments and practices of the present invention, otherembodiments, and their features and characteristics, will be apparentfrom the description, figures and claims that follow, with all of theclaims hereby being incorporated by this reference into this Summary.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the amino acid sequences (SEQ ID NOs: 1 and 2) of secretedColoUp1 protein. A. An N-terminal signal peptide is cleaved betweenamino acids 30-31 of the full-length ColoUp1 protein; B. An N-terminalsignal peptide is cleaved between amino acids 33-34 of the full-lengthColoUp1 protein.

FIG. 2 shows the amino acid sequence (SEQ ID NO: 3) of secreted ColoUp2protein.

FIG. 3 shows the nucleic acid sequence (SEQ ID NO: 4) of ColoUp1.

FIG. 4 shows the nucleic acid sequence (SEQ ID NO: 5) of CoIoUp2.

FIG. 5 shows the nucleic acid sequence (SEQ ID NO: 6) of Osteopontin.

FIG. 6 shows the nucleic acid sequence (SEQ ID NO: 7) of ColoUp3.

FIG. 7 shows the nucleic acid sequence (SEQ ID NO: 8) of ColoUp4.

FIG. 8 shows the nucleic acid sequence (SEQ ID NO: 9) of ColoUp5.

FIG. 9 shows the nucleic acid sequence (SEQ ID NO: 10) of ColoUp6.

FIG. 10 shows the nucleic acid sequence (SEQ ID NO: 11) of ColoUp7.

FIG. 11 shows the nucleic acid sequence (SEQ ID NO: 12) of ColoUp8.

FIG. 12 shows the amino acid sequence (SEQ ID NO: 13) of full-lengthColoUp1 protein.

FIG. 13 shows the amino acid sequence (SEQ ID NO: 14) of full-lengthColoUp2 protein.

FIG. 14 shows the amino acid sequence (SEQ ID NO: 15) of full-lengthOsteopontin protein.

FIG. 15 shows the amino acid sequence (SEQ ID NO: 16) of full-lengthColoUp3 protein.

FIG. 16 shows the amino acid sequence (SEQ ID NO: 17) of full-lengthColoUp4 protein.

FIG. 17 shows the amino acid sequence (SEQ. ID NO: 18) of full-lengthColoUp5 protein.

FIG. 18 shows the amino acid sequence (SEQ ID NO: 19) of full-lengthColoUp6 protein.

FIG. 19 shows the amino acid sequence (SEQ ID NO: 20) of full-lengthColoUp8 protein.

FIG. 20 is a graphical display of ColoUp1 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 21 is a graphical display of ColoUp2 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well, as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 22 is a graphical display of Osteopontin expression levels measuredby micro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGF.

FIG. 23 is a graphical display of ColoUp3 expression levels measured bymicro-array profiling in different samples. A. In normal colon:epithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 24 is a graphical display of ColoUp4 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. Tn colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 25 is a graphical display of ColoUp5 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 26 is a graphical display of ColoUp6 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 27 is a graphical display of ColoUp7 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, nominal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated With TGFβ.

FIG. 28 is a graphical display of ColoUp8 expression levels measured bymicro-array profiling in different samples. A. In normal colonepithelial strips, normal liver, and colonic muscle; B. In premalignantcolon adenomas as well as in colon cancers of Dukes stages B, Dukesstage C, and Duke stages D; C. In colon cancer liver metastasis; D. Incolon cancer cell lines, colon cancer xenografts grown in athymic mice,MSI cell lines, and V330 cell lines treated with TGFβ.

FIG. 29 shows northern blot analysis of ColoUp1 mRNA levels in normalcolon tissues and colon cancer cell lines or tissues. A. In normal colontissue samples and a group of colon cancer cell lines; B. and C. Innormal colon tissues and colon neoplasms from 15 individuals with coloncancers and one individual with a colon adenoma.

FIG. 30 shows detection of T7 epitope-tagged ColoUp 1 protein levels intransfected FET cells and Vaco400 cells. A. Secretion of epitope-taggedColoUp1 protein in V400 cell growth media by Western blot (“T” aretransfectants with an epitope tagged ColoUp1 expression vector; “C” aretransfectants with an empty control vector); B. Expression of T7epitope-tagged ColoUp1 protein in transfected PET cells and V400 cellsby Western blot (left panel), and secretion of epitope-tagged ColoUp1protein in growth media by serial immunoprecipitation and Western blot(right panel)(Cell extract amounts loaded: FET 75 trig/well; V400=31.1mg/well; Volume of media used for immuno-precipitation=1 ml of 20 ml).

FIG. 31 shows northern blot analysis of ColoUp2 mRNA levels in normalcolon tissue samples and a group of colon cancer cell lines (top panel).The bottom panel shows the ethidium bromide stained gel corresponding tothe blot.

FIG. 32 shows detection of V5 epitope-tagged ColoUp2 protein levels intransfected SW480 cells and Vaco400 cells (24 hours and 48 hours aftertransfection). Expression of epitope-tagged ColoUp2 protein intransfected cells by Western blot (right panel), and secretion ofepitope-tagged ColoUp2 protein in growth media by serialimmunoprecipitation and Western blot (left panel).

FIG. 33 shows two northern blot analysis of ColoUp5 mRNA levels innormal colon tissues and a group of colon cancer cell lines (toppanels). The bottom panels show the ethidium bromide stained gelcorresponding to the blot.

FIG. 34 illustrates an allotment of the human, mouse, and rat ColoUp5(FoxQ1) amino acid sequences.

FIG. 35 illustrates an alignment of the human, mouse, and rat ColoUp5(FoxQ1) nucleic acid sequences.

FIG. 36 shows a western blot of V5 tagged ColoUp2 protein detected byanti-V5 antibody. Lane 1: media supernate from SW480 colon cancer cellstransfected with an empty expression vector. Lane 2: media supernatefrom ColoUp2-V5 expressing cells. Lane 3: size markers. Lane 4 showsassay of serum from a mouse xenografted with control SW480 cellscorresponding to lane 1. Lanes 5 and 6 show detection of circulatingColoUp2 proteins in blood from two mice bearing human colon cancerxenografts from ColoUp2-V5 expressing SW480 colon cells shown in lane 2.ColoUp2 is secreted as an 85 KD and a companion 55 KD size protein.

FIG. 37 shows a western blot with anti-V5 antibody of V5 tagged ColoUp1protein. Lane 1: media supernate from SW480 colon cancer cellstransfected with an empty expression vector. Lane 2: media supernatefrom ColoUp1-V5 expressing SW480 cells, Lane 3 shows assay of serum froma mouse xenografted with control SW480 cells corresponding to lane 1.Lanes 4 shows detection of circulating ColoUp 1 proteins in blood from amouse bearing tumor xenografts from ColoUp1-V5 expressing SW480 cellsshown in lane 2. Lane 5: size markers.

FIG. 38 shows, in the upper panel, the purification of ColoUp2 protein.Shown is a Coomassie blue staining of 250 ng (lane 2a) and 500 ng (lane3a) of a purified ColoUp2 protein preparation. Size markers are in lane1a. In the lower panel is shown a Coomassie blue stained gel showingpurification of His-tagged ColoUP1 protein on Ni-NTA heads. Lane1:markers, Lane 2 media from mock transfected cells, Lane 3 purificationof media from ColoUp1 transfected cells. Clearly shown is purificationto homogeneity of the 180 kd ColoUp protein.

FIG. 39 shows, in the top panel, detection on an anti-V5 western ofV5-tagged ColoUp2 protein. Lane 1: media from mock transfected Caco2cells. Lane 2: detection of secreted ColoUp2 protein from transientlytransfected Caco2 cells grown in standard culture dishes. Seen are thetypical 85 KD and 55 KD secreted bands (the lane is heavily overloadedand minor degradation products are also visualized). Lane 3: molecularweight markers. Lanes 4-7: detection of ColoUp2 secreted into thebasolateral compartment (lower chamber) of transiently transfected Caco2grown as a monolayer on a transwell filter. Lanes 9-12 show the generalabsence of ColoUp2 in the corresponding apical apical compartment, withthe exception of the 48 hour time point. The table shows the electricalresistance and transfection efficiency (gfp expression) measured at eachtime point. A dip in the electrical resistance at 48 hours suggests someleakiness of the monolayer at that time point.

FIG. 40: Top panel shows detection on anti-V5 western of V5-taggedColoUp1 protein. Control lane shows detection of purified recombinantColoUp1. Identical bands are seen in media harvested on days 1-4 (lanesD1-D4) from both apical and basolateral compartments. The table showsthe electrical resistance and transfection efficiency (gfp expression)measured at each time point.

FIG. 41 shows the amino acid sequence of the approximately 55 kW.C-terminal fragment of ColoUp2 that is a prominent secreted and serumform of ColoUp2.

DETAILED DESCRIPTION 1. Definitions

For convenience, certain terms employed in the specification, examples,and appended claims are collected here. Unless defined otherwise, alltechnical and scientific terms used herein have the same meaning ascommonly understood by one of ordinary skill in the art to which thisinvention belongs.

The articles “a” and “an” are used herein to refer to one or to morethan one i.e., to at least one) of the grammatical object of thearticle. By way of example, “an element” means one element or more thanone element.

The terms “adenoma”, “colon adenoma” and “polyp” are used herein todescribe any precancerous neoplasia of the colon.

The term “antibody” as used herein is intended to include wholeantibodies, e.g., of any isotype (IgG, IgA, IgM, IgE, etc), and includesfragments thereof which are also specifically reactive with avertebrate, e.g., mammalian, protein. Antibodies can be fragmented usingconventional techniques and the fragments screened for utility and/orinteraction with a specific epitope of interest. Thus, the term includessegments of proteolytically-cleaved or recombinantly-prepared portionsof an antibody molecule that are capable of selectively reacting with acertain protein. Non-limiting examples of such proteolytic and/orrecombinant fragments include Fab, F(ab′)₂, Fab′, Fv, and single chainantibodies (scFv) containing a V[L] and/or V[H] domain joined, by apeptide linker. The scFv's may be covalently or non-covalently linked toform antibodies having two or more binding sites. The term antibody alsoincludes polyclonal, monoclonal, or other purified preparations ofantibodies and recombinant antibodies.

The term “colon” as used herein is intended to encompass the right colon(including the cecum), the transverse colon, the left colon and therectum.

The terms “colorectal cancer” and “colon cancer” are usedinterchangeably herein to refer to any cancerous neoplasia of the colon(including the rectum, as defined above).

The term “ColoUpX” (e.g. ColoUP1, ColoUp2 . . . ColoUp8) is used torefer to a nucleic acid encoding a ColoUp protein or a ColoUp proteinitself, as well as distinguishable fragments of such nucleic acids andproteins, longer nucleic acids and polypeptides that comprisedistinguishable fragments or full length nucleic acids or polypeptides,and variants thereof. Variants include polypeptides that are at least90% identical to the relevant human ColoUp SEQ ID Nos. referred to inthe application, and nucleic acids encoding such variant polypeptides.In addition, variants include different post-translationalmodifications, such as glycosylations, methylations, etc. Particularlypreferred variants include any naturally occurring variants, such asallelic differences, mutations that occur in a neoplasia and secreted orprocessed forms. The terms “variants” and “fragments” are overlapping.

As used herein, the phrase “gene expression” or “protein expression”includes any information pertaining to the amount of gene transcript orprotein present in a sample, as well as information about the rate atwhich genes or proteins are produced or are accumulating or beingdegraded (eg. reporter gene data, data from nuclear runoff experiments,pulse-chase data etc.). Certain kinds of data might be viewed asrelating to both gene and protein expression. For example, proteinlevels in a cell are reflective of the level of protein as well as thelevel of transcription, and such data is intended to be included by thephrase “gene or protein expression information”. Such information may begiven in the form of amounts per cell, amounts relative to a controlgene or protein, in unitless measures, etc.; the term “information” isnot to be limited to any particular means of representation and isintended to mean any representation that provides relevant information.The term “expression levels” refers to a quantity reflected in orderivable from the gene or protein expression data, whether the data isdirected to gene transcript accumulation or protein accumulation orprotein synthesis rates, etc.

The term “detection” is used herein to refer to any process of observinga marker, in a biological sample, whether or not the marker is actuallydetected. In other words, the act of probing a sample for a marker is a“detection” even if the marker is determined to be not present or belowthe level of sensitivity. Detection may be a quantitative,semi-quantitative or non-quantitative observation.

The terms “healthy”, “normal” and “non-neoplastic” are usedinterchangeably herein to refer to a subject or particular cell ortissue that is devoid (at least to the limit of detection) of a diseasecondition, such as a neoplasia, that is associated with increasedexpression of a ColoUp gene. These terms, are often used herein inreference to tissues and cells of the colon. Thus, for the purposes ofthis application, a patient with severe heart disease but lacking aColoUp-associated disease would be termed “healthy”.

The term “including” is used herein to mean, and is used interchangeablywith, the phrase “including but not limited to”.

As used herein, the term “nucleic acid” refers to polynucleotides suchas deoxyribonucleic acid (DNA), and, where appropriate, ribonucleic acid(RNA). The term should also be understood to include analogs, of eitherRNA or DNA made from nucleotide analogs, and, as applicable to theembodiment being described, single-stranded (such as sense or antisense)and double-stranded polynucleotides.

The term “or” is used herein to mean, and is used interchangeably with,the term “and/of”, unless context clearly indicates otherwise.

The term “percent identical” refers to sequence identity between twoamino acid sequences or between two nucleotide sequences. Identity caneach be determined by comparing a position in each sequence which may bealigned for purposes of comparison. When an equivalent position in thecompared sequences is occupied by the same base or amino acid, then themolecules are identical at that position; when the equivalent siteoccupied by the same or a similar amino acid residue (e.g., similar insteric and/or electronic nature), then the molecules can be referred toas homologous (similar) at that position. Expression as a percentage ofhomology/similarity or identity refers to a function of the number ofidentical or similar amino acids at positions shared by the comparedsequences, Various alignment algorithms and/or programs may be used,including FASTA, BLAST or ENTREZ. FASTA and BLAST are available as apart of the GCG sequence analysis package (University of Wisconsin,Madison, Wis.), and can be used with, e.g., default settings. ENTREZ isavailable through the National Center for Biotechnology Information,National Library of Medicine, National institutes of Health, Bethesda,Md. In one embodiment, the percent identity of two sequences can bedetermined by the GCG program with a gap weight of 1, e.g., each aminoacid gap is weighted as if it were a single amino acid or nucleotidemismatch between the two sequences.

The terms “polypeptide” and “protein” are used interchangeably herein.

The term “purified protein” refers to a preparation of a protein orproteins which are preferably isolated from, or otherwise substantiallyfree of, other proteins normally associated with the protein(s) in acell or cell lysate. The term “substantially free of other cellularproteins” (also referred to herein as “substantially free of othercontaminating proteins”) is defined as encompassing individualpreparations of each of the component proteins comprising less than 20%(by dry weight) contaminating protein, and preferably comprises lessthan 5% contaminating protein. Functional forms of each of the componentproteins can be prepared as purified preparations by using a cloned geneas described in the attached examples. By “purified”, it is meant, whenreferring to component protein preparations used to generate areconstituted protein mixture, that the indicated molecule is present inthe substantial absence of other biological macromolecules, such asother proteins (particularly other proteins which may substantiallymask, diminish, confuse or alter the characteristics of the componentproteins either as purified preparations or in their function in thesubject reconstituted mixture). The term “purified” as used hereinpreferably means at least 80% by dry weight, more preferably in therange of 85% by weight, more preferably 95-99% by weight, and mostpreferably at least 99.8% by weight, of biological macromolecules of thesame type present (but water, buffers, and other small molecules,especially molecules having a molecular weight of less than 5000, can bepresent). The term “pure” as used herein preferably has the samenumerical limits as “purified” immediately above.

A “recombinant nucleic acid” is any nucleic acid that has been placedadjacent to another nucleic acid by recombinant DNA techniques. A“recombinant nucleic acid” also includes any nucleic acid that has beenplaced next to a second nucleic acid by a laboratory genetic techniquesuch as, for example, transformation and integration, transposon hoppingor viral insertion. In general, a recombined nucleic acid is notnaturally located adjacent to the second nucleic acid.

The term “recombinant protein” refers to a protein that is produced byexpression from a recombinant nucleic acid.

A “sample” includes any material that is obtained or prepared fordetection of a molecular marker, or any material that is contacted witha detection reagent or detection device for the purpose of detecting amolecular marker.

A “subject” is any organism of interest, generally a mammalian subject,such as a mouse, and preferably a human subject.

2. Overview

In certain aspects, the invention, relates to methods for determiningwhether a subject is likely or unlikely to have a colon neoplasia. Inother aspects, the invention relates to methods for determining whethera patient is likely or unlikely to have a colon cancer. In furtheraspects, the invention relates to methods for monitoring colon neoplasiain a subject. In further aspects, the invention relates to methods forstaging a subject's colon neoplasia. A colon neoplasia is any cancerousor precancerous growth located in, or derived from, the colon. The colonis a portion of the intestinal tract that is roughly three feet inlength, stretching from the end of the small intestine to the rectum.Viewed in cross section, the colon consists of four distinguishablelayers arranged in concentric rings surrounding an interior space,termed the lumen, through which digested materials pass. In order,moving outward from the lumen, the layers are termed the mucosa, thesubmucosa, the muscularis propria and the subserosa. The mucosa includesthe epithelial layer (Cells adjacent to the lumen), the basementmembrane, the lamina propria and the muscularis mucosae. In general, the“wall” of the colon is intended to refer to the submucosa and the layersoutside of the submucosa. The “lining” is the mucosa.

Precancerous colon neoplasias are referred to as adenomas or adenomatouspolyps. Adenomas are typically small mushroom-like or wart-like growthson the lining of the colon and do not invade into the wall of the colon.Adenomas may be visualized through a device such as a colonoscope orflexible sigmoidoscope. Several studies have shown that patients whoundergo screening for and removal of adenomas have a decreased rate ofmortality from colon cancer. For this and other reasons, it is generallyaccepted that adenomas are an obligate precursor for the vast majorityof colon cancers.

When a colon neoplasia invades into the basement membrane of the colon,it is considered a colon cancer, as the term “colon cancer” is usedherein. In describing colon cancers, this specification will generallyfollow the so-called “Dukes” colon cancer staging system. Other stagingsystems have been devised, and the particular system selected is, forthe purposes of this disclosure, unimportant. The characteristics thatthe describe a cancer are of greater significance than the particularterm used to describe a recognizable stage. The most widely used stagingsystems generally use at least one of the following characteristics forstaging: the extent of tumor penetration into the colon wall, withgreater penetration generally correlating with a more dangerous tumor;the extent of invasion of the tumor through the colon wall and intoother neighboring tissues, with greater invasion generally correlatingwith a more dangerous tumor; the extent of invasion of the tumor intothe regional lymph nodes, with greater invasion generally correlatingwith a more dangerous tumor; and the extent of metastatic invasion intomore distant tissues, such as the liver, with greater metastaticinvasion generally correlating with a more dangerous disease state.

“Dukes A” and “Dukes B” colon cancers are neoplasias that have invadedinto the wall of the colon but have not spread into other tissues. DukesA colon cancers are cancers that have not invaded beyond the submucosa,Dukes B colon cancers are subdivided into two groups: “Dukes B1” and“Dukes B2”. “Dukes B1” colon cancers are neoplasias that have invaded upto but not through the muscularis propria, Dukes B2 colon cancers arecancers that have breached completely through the muscularis propria.Over a five year period, patients with Dukes A cancer who receivesurgical treatment (i.e. removal of the affected tissue) have a greaterthan 90% survival rate. Over the same period, patients with Dukes B1 andDukes B2 cancer receiving surgical treatment have a survival rate ofabout 85% and 75%, respectively. Dukes A, B1 and B2 cancers are alsoreferred to as T1, T2 and T3-T4 cancers, respectively.

“Dukes C” colon cancers are cancers that have spread to the regionallymph nodes, such as the lymph nodes of the gut. Patients with Dukes Ccancer who receive surgical treatment alone have a 35% survival rateover a five year period, but this survival rate is increased to 60% inpatients that receive chemotherapy.

“Dukes D” colon cancers are cancers that have metastasized to otherorgans. The liver is the most common organ in which metastatic coloncancer is found. Patients with Dukes D colon cancer have a survival rateof less than 5% over a five year period, regardless of the treatmentregimen.

As noted above, early detection of colon neoplasia, coupled withappropriate intervention, is important for increasing patient survivalrates. Present systems for screening for colon neoplasia are deficientfor a variety of reasons, including a lack of specificity or sensitivity(e.g. Fecal Occult Blood Test, flexible sigmoidoscopy) or a high costand intensive use of medical resources (e.g. colonoscopy). Alternativesystems for detection of colon neoplasia would be useful in a wide rangeof other clinical circumstances as well. For example, patients whoreceive surgical or pharmaceutical therapy for colon cancer mayexperience a relapse. It would be advantageous to have an alternativesystem for determining whether such patients have a recurrent orrelapsed colon neoplasia. As a further example, an alternativediagnostic system would facilitate monitoring an increase, decrease orpersistence of colon neoplasia in a patient known to have a colonneoplasia. A patient undergoing chemotherapy may be monitored to assessthe effectiveness of the therapy.

Accordingly, in certain embodiments, the invention provides molecularmarkers that distinguish between cells that are not part of a colonneoplasia, referred to herein as “healthy cells”, and cells that arepart of a colon neoplasia (e.g. an adenoma or a colon cancer), referredto herein as “colon neoplasia cells”. Certain molecular markers of theinvention, including ColoUp1 and ColoUp2, are expressed at significantlyhigher levels in adenomas, Dukes A, Dukes B 1, Dukes B2 and metastaticcolon cancer of the liver (liver metastases) than in healthy colontissue, healthy liver or healthy colon muscle. Certain molecularmarkers, including ColoUp1 and ColoUp2 are expressed at significantlyhigher levels in cell lines derived from colon cancer or cell linesengineered to imitate an, aspect of a colon cancer cell. Particularlypreferred molecular markers of the invention are markers thatdistinguish between healthy cells and cells of an adenoma. While notwishing to be bound to theory, it is contemplated that because adenomasare thought to be an obligate precursor for greater than 90% of coloncancers, markers that distinguish between healthy cells and cells of anadenoma are particularly valuable for screening apparently healthypatients to determine whether the patient is at increased risk for(predisposed to) developing a colon cancer. Furthermore, particularlypreferred molecular markers are those that are actually present in theserum of an animal having a colon neoplasia, and in general, a secretedprotein will generally occur in the serum only if it is secreted from acell contacting a blood vessel, or a compartment in diffusional contactwith a blood vessel. For example, protein secreted from a large oradvanced colon cancer will generally be found in the blood stream, but aprotein secreted from a colon adenoma may not be present in the bloodunless it is secreted from the basolateral face of the cell. Molecularmarkers that occur in the urine are generally derived from a polypeptidethat is present in the blood. Optionally, a molecular marker is one thatis present in the lumen of the colon (e.g., may be found in theintestinal mucous or in stool samples), and such a marker will generallybe one that is secreted from the apical face of a cell.

In certain embodiments, the invention provides methods for using ColoUpmolecular markers for determining whether a patient has or does not havea condition characterized by increased expression of one or more ColoUpnucleic acids or proteins described herein. In certain embodiments, theinvention provides methods for determining whether a patient is or isnot likely to have a colon neoplasia. In further embodiments, theinvention provides methods for determining whether the patient is havinga relapse or determining whether a patient's colon neoplasia isresponding to treatment.

3. Methods for Identifying Candidate Molecular Markers for ColonNeoplasia

In certain aspects, the invention relates to the observation that whengene expression data is analyzed using carefully selected criteria, thelikelihood of identifying strong candidate molecular markers of a colonneoplasia is quite high. Accordingly, in certain embodiments, theinvention provides methods and criteria for analyzing gene expressiondata to identify candidate molecular markers for colon neoplasia.Although methods and criteria of the invention may be applied toessentially any relevant gene expression data, the benefits of using theinventive methods and criteria are readily apparent when applied to thecopious data produced by highly parallel gene expression measurementsystems, such as microarray systems. The human genome is estimated to becapable of producing roughly 20,000 to 100,000 different genetranscripts, thousands of which may show a change in expression level inhealthy cells versus colon neoplasia cells. It is relativelycost-effective to obtain large quantities of gene expression data and touse this data to identify thousands of candidate molecular markers.However, a significant amount of labor intensive experimentation isgenerally needed to move from the identification of a candidatemolecular marker to an effective diagnostic test for a health conditionof interest. In fact, as of the time of filing of this application, theresources required to generate a diagnostic test from a single candidatemolecular marker identified by gene expression data are large enoughthat it is essentially impossible to extract commercially valuable andclinically useful diagnostics from a list of hundreds or thousands ofgenes whose expression levels change in a particular situation.Accordingly, there is a substantial practical value in being able toselect a small number (e.g. ten or fewer) of high-quality molecularmarkers for further study.

In certain embodiments, candidate molecular markers for colon neoplasiamay be selected by comparing gene expression in liver metastatic coloncancer samples (“liver mets”), normal (non-neoplastic) colon samples andnormal liver samples. In this embodiment, candidate molecular markersare those genes (and their gene products) that have a level ofexpression in liver mets (assessed as a median expression level acrossthe sample set) that is at least four times greater than the level ofexpression in normal colon samples (also assessed as a median expressionlevel across the sample set). Furthermore, in this embodiment, themedian level of expression in liver mets should be greater than themedian level of expression in normal liver samples. The criteriaemployed in this embodiment provide a high threshold to eliminate mostlower quality markers and further eliminate contaminants from livertissue.

In certain embodiments, candidate molecular markers for colon neoplasiamay be selected by comparing gene expression in normal colon to geneexpression in a plurality of different cell lines cultured frommetastatic colon cancer samples. For example median metastatic coloncancer cell line gene expression may be calculated as the median of 8colon cancer cell lines of the Vaco colon cancer cell line series(Markowitz, S. et al. Science. 268: 1336-1338, 1995), such as thefollowing liver metastatses-derived cell lines: V394, V576, V241, V9M,V400, V10M, V503, V786. In embodiments employing this criterion,candidate molecular markers are those genes (and their gene products)that have at least a three-fold higher median level of expression acrossthe cell lines tested than in the normal colon tissue.

In certain embodiments, candidate molecular markers for colon neoplasiamay be selected by comparing gene expression in normal colon to geneexpression in a plurality of colon cancer xenografts grown in athymicmice (“xenografts”). In embodiments employing this criterion, candidatemolecular markers are those genes (and their gene products) that have atleast a four-fold higher median level of expression across thexenografts tested than in the normal colon tissue.

In certain embodiments, candidate molecular markers for colon neoplasiamay be selected by comparing maximum gene expression in normal colon tominimum gene expression in liver mets. In these embodiments, candidatemolecular markers are those genes (and their gene products) that have aminimum gene expression in liver mets that is at least equal to themaximum gene expression in normal colon. Furthermore, in thisembodiment, the median level of expression in liver mets should begreater than the median level of expression in normal liver samples.

In a preferred embodiment, a list of candidate molecular markers forcolon neoplasia is selected by first identifying a subset of geneshaving a four-fold greater median expression in liver mets that innormal colon and in normal liver. This subset is then further narrowedto a final list by identifying those genes that have a three-foldgreater median expression across colon cancer cell lines than in normalcolon. Optionally, a particularly preferred list may be generated byfurther selecting those genes having a minimum gene expression in livermets that is greater than or equal to the maximum gene expression innormal colon. The gene products (e.g. proteins and nucleic acids) of theshort list of genes generated in these preferred embodiments constitutea list of high-quality candidate molecular markers for colon cancer.

In another preferred embodiment, a list of candidate molecular markersfor colon neoplasia is selected by first identifying a subset of geneshaving a four-fold greater median expression in liver mets that innormal colon and in normal liver. This subset is then further narrowedby identifying those genes that have a nine-fold greater medianexpression in liver mets than in normal colon. This subset is thenfurther narrowed to a final list by identifying those genes that have afour-fold greater median expression across colon cancer cell lines thanin normal colon. The gene products (e.g. proteins and nucleic acids) ofthe short hst of genes generated in these preferred embodimentsconstitute a list of high-quality candidate molecular markers for coloncancer.

Depending on the nature of the intended use for the molecular marker itmay be desirable to add further criteria to any of the precedingembodiments. In certain embodiments, the invention relates to candidatemolecular markers for categorizing a patient as likely to have or notlikely to have a colon neoplasia (including adenomas and colon cancers),and in these embodiments, a high-quality candidate molecular marker willbe expressed from a gene having an increased expression in both adenomasand liver mets relative to normal colon, and preferably in other coloncancer stages, including Dukes A, Dukes B1, Dukes B2 and Dukes C. Incertain embodiments the invention relates to candidate molecular markersfor categorizing a patient as likely to have or not likely to have acolon cancer (including metastatic and non-metastatic forms), and inthese embodiments, a high-quality candidate molecular marker will beexpressed from a gene having an increased expression in liver metsrelative to adenomas and normal colon, and preferably there will beelevated expression in other colon cancer stages, including Dukes A,Dukes B1, Dukes B2 and Dukes C. In certain embodiments, the inventionrelates to candidate molecular markers for categorizing a patient aslikely or not likely to have a metastatic colon cancer, and in suchembodiments, a comparison to gene expression in other colon neoplasias(e.g. adenomas, Dukes A, Dukes B1, Dukes B2, Dukes C), while potentiallyuseful, is not necessary, although it is noted that expression innon-metastatic states may indicate that a candidate molecular marker isnot of high quality for distinguishing metastatic colon cancer fromnon-metastatic states.

Furthermore, in those embodiments pertaining to molecular markers to beused for detection in a body fluid, such as blood, a high qualitymolecular marker will preferably be a secreted protein. In thoseembodiments pertaining to neoplasia identification or targeting, a highquality molecular marker will preferably be a protein with a portionadherent to and exposed on the extracellular surface of a neoplasia,such as a transmembrane protein with a significant extracellularportion.

Gene expression data may be gathered using one or more of the many knownand appropriate techniques that, in view of this specification, may beselected to one of skill in the art. In certain preferred embodiments,gene expression data is gathered by a highly parallel system, meaning asystem that allows simultaneous or near-simultaneous collection ofexpression data for one hundred or more gene transcripts. Exemplaryhighly parallel systems include probe arrays (“arrays”) that are oftendivided into microarrays and macroarrays, where microarrays have a muchhigher density of individual probe species per area. Arrays generallyconsist of a surface to which probes that correspond in sequence to geneproducts (e.g., cDNAs, mRNAs, oligonucleotides) are bound-at knownpositions. The probes can be, e.g., a synthetic oligomer, a full-length(DNA, a less-than full length cDNA, or a gene fragment. Usually amicroarray will have probes corresponding to at least 100 gene productsand more preferably, 500, 1000, 4000 or more. Probes may be smalloligomers or larger polymers, and there may be a plurality ofoverlapping or non-overlapping probes for each transcript.

The nucleic acids to be contacted with the microarray may be prepared ina variety of ways. Methods for preparing total and poly(A)+ RNA are wellknown and are described generally in Sambrook et al., supra. LabeledcDNA may be prepared from mRNA by oligo dT-primed or random-primedreverse transcription, both of which are well known in the art (seee.g., Klug and Berger, 1987, Methods Enzymol. 152:316-325). cDNAs may belabeled by incorporation of labeled nucleotides or by labeling aftersynthesis. Preferred labels are fluorescent labels.

Nucleic acid hybridization and wash conditions are chosen so that thepopulation of labeled nucleic acids will specifically hybridize toappropriate, complementary probes affixed to the matrix. Optimalhybridization conditions will depend on the length (e.g., oligomerversus polynucleotide greater than 200 bases) and type (e.g., RNA, DNA,PNA) of labeled nucleic acids and immobilized polynucleotide oroligonucleotide. General parameters for specific (i.e., stringent)hybridization conditions for nucleic acids are described in Sambrook etsupra, and in Ausubel et al., 1987, Current Protocols in MolecularBiology, Greene Publishing and Wiley-Interscience, New York, which isincorporated, in its entirety for all purposes. Non-specific binding ofthe labeled nucleic acids to the array can be decreased by treating thearray with a large quantity of non-specific DNA—a so-called “blocking”step.

Signals, such as fluorescent emissions for each location on an array aregenerally recorded, quantitated and analyzed, using a variety ofcomputer software. Signal for any one gene product may be normalized bya variety of different methods. Arrays preferably include control andreference probes. Control probes are nucleic acids which serve toindicate that the hybridization was effective. Reference probes allowthe normalization of results from one experiment to another, and tocompare multiple experiments on a quantitative level. Reference probesare typically chosen to correspond to genes that are expressed at arelatively constant level across different cell types and/or acrossdifferent culture conditions. Exemplary reference nucleic acids includehousekeeping genes of known expression levels, e.g., GAPDH, hexokinaseand actin.

Following the data gathering operation, the data will typically bereported to a data analysis system. To facilitate data analysis, thedata obtained by the reader from the device will typically be analyzedusing a digital computer. Typically, the computer will be appropriatelyprogrammed for receipt and storage of the data from the device, as wellas for analysis and reporting of the data gathered, e.g., subtraction ofthe background, deconvolution multi-color images, flagging or removingartifacts, verifying that controls have performed properly, normalizingthe signals, interpreting fluorescence data to determine the amount ofhybridized target, normalization of background and single base mismatchhybridizations, and the like. Various analysis methods that may beemployed in such a data analysis system, or by a separate computer aredescribed herein.

A number of methods for constructing or using arrays are described inthe following references. Schena et al., 1995, Science 270:467-470;DeRisi et al., 1996, Nature Genetics 14:457-460; Shalon et al., 1996,Genome Res. 6:639-645; Schena et al., 1995, Proc. Natl. Acad. Sci. USA93:10539-11286; Fodor et al. 1991, Science 251:767-773; Pease et al.,1994, Proc. Natl. Acad. Sci. USA 91:5022-5026; Lockhart et al., 1996,Nature Biotech 14:1675; U.S. Pat. Nos. 6,051,380; 6,083,697; 5,578,832;5,599,695; 5,593,839; 5,631,734; 5,556,752; 5,510,270; EP No. 0 799 897;PCT No. WO 97/29212; PCT No. WO 97/2731.7; EP No. 0 785 280; PCT No. WO97/02357; EP No. 0 728 520; EP No. 0 721 016; PCT No. WO 95/22058.

A variety of companies provide microarrays and software for extractingcertain information from microarray data. Such companies includeAffymetrix (Santa Clara, Calif.), GeneLogic (Gaithersburg, Md.) and LosBiotechnology Inc. (South San Francisco, Calif.).

While the above discussion focuses on the use of arrays for thecollection of gene expression data, such data, may also be obtainedthrough a variety of other methods, that, in view of this specification,are known to one of skill in the art. Such methods include the serialanalysis of gene expression (SAGE) technique, first described inVelculescu et al. (1995) Science 270, 484-487. Reverse transcriptasepolymerase chain reaction (RT-PCR) may be used, and particularly incombination with fluorescent probe systems such as the Taqman™fluorescent probe system. Numerous RT-PCR samples can be analyzedsimultaneously by conducting parallel PCR amplification, e.g., bymultiplex PCR. Further techniques include dotblot analysis and relatedmethods (see, e.g., G. A. Beltz et al., in Methods in Enzymology, Vol.100, Part B, R. Wu, L. Grossmam, K. Moldave, Eds., Academic Press, NewYork, Chapter 19, pp. 266-308, 1985); Northern blots and in situhybridization (probing a tissue sample directly).

The quality and biological relevance of gene expression data will besignificantly affected by the quality of the biological material used toobtain gene expression. In preferred embodiments, the methods describedherein for identifying candidate molecular markers for colon neoplasiaemploy tissue samples obtained with appropriate consent from humanpatients and rapidly frozen. At a point prior to gene expressionanalysis, the tissue sample is preferably prepared by carefullydissecting away as much heterogeneous tissue as is possible with theavailable tools. In other words, for a colon cancer sample, adherentnon-cancerous tissue should be dissected away, to the extent that it ispossible. In preferred embodiments, healthy tissue is obtained from asubject that has a colon neoplasia but is tissue that is not directlyentangled in a neoplasia.

Example 1, below, illustrates the operation of a method of selectinghigh-quality molecular markers, and the following markers were selected,using criteria disclosed herein, from microarray expression data:ColoUp1, ColoUp2, ColoUp3, ColoUp4, ColoUp5, ColoUp6, ColoUp7 andColoUp8. In addition, osteopontin was identified as having expressioncharacteristics very similar to those identified using the selectioncriteria. Further experimentation (see Examples) demonstrated that thesemolecular markers fall into four categories: “secreted” (ColoUp1,ColoUp2 and osteopontin), “transmembrane” (ColoUp3), “transcriptionfactors” (ColoUp4, ColoUp5) and “other” (ColoUp6, ColoUp7, ColoUp8).Further experimentation also demonstrated that ColoUp1, ColoUp2,ColoUp3, ColoUp5 and ColoUp7 are, generally speaking, expressed athigher levels in a variety of colon neoplasias (adenomas, Dukes Btumors, Dukes C tumors and liver mets) than in healthy cells. Inaddition, further experimentation demonstrated that osteopontin isoverexpressed in colon cancers (Dukes B, Dukes C and liver mets)relative to adenomas and normal colon.

In certain embodiments, a preferred molecular marker for use in adiagnostic test that employs a body fluid sample, such as a blood, orurine sample, or an excreted sample material, such as stool, is asecreted protein, such as the secreted portion of a ColoUp1 protein,ColoUp2 protein or osteopontin protein.

In certain embodiments, a preferred molecular marker for a method thatinvolves targeting or marking a colon neoplasia is a transmembraneprotein, such as ColoUp3, and particularly the extracellular portion ofColoUp3. Transmembrane proteins are desirable for such methods becausethey are both anchored to the neoplastic cell and exposed to theextracellular surface.

In certain embodiments, a preferred molecular marker for use in adiagnostic test to distinguish subjects likely to have a colon neoplasiafrom those not likely to have a colon neoplasia is a gene product of theColoUp1, ColoUp2, ColoUp3, ColoUp4 or ColoUp5 genes. Examples ofsuitable gene products include proteins, both secreted and not secretedand transcripts. In embodiments employing proteins that are notsecreted, such as ColoUp3, ColoUp4 and ColoUp5, a preferred embodimentof the diagnostic test is a test for the presence of the protein ortranscript in cells shed from the colon or colon neoplasia (which, inthe case of metastases is not necessarily located in the colon) into asample Material, such as stool. In embodiments employing proteins thatare secreted, such as ColoUp1 and ColoUp2, a preferred embodiment of thediagnostic test is a test for the presence of the protein in a bodyfluid, such as urine or blood or an excreted material, such as stool. Itshould be noted, however, that intracellular protein may be present in abody fluid if there is significant cell lysis or through some otherprocess. Likewise, secreted proteins are likely to be adherent, even ifat a relatively low level, to the cells in which they were produced.

In certain embodiments, a preferred molecular marker for distinguishingsubjects having a colon cancer from those having an adenoma or a normalcolon is gene product of the ColoUp6 and osteopontin genes. Inembodiments preferably employing marker proteins that are secreted, suchas a test using a body fluid sample, a preferred marker is a secretedosteopontin protein.

ColoUp 1:

A human CoioUp1 nucleic acid sequence encodes a full-length protein of1361 amino acids. SignalP V1.1 predicts that human ColoUp1 protein hasan N-terminal signal peptide that is Cleaved between either amino acids30-31(ATS-TV) or amino acids 33-34 (TVA-AG). Four potentialglycosylation sites are identified in ColoUp1 protein. Further, ColoUp1protein is predicted to have multiple serine, threonine, and tyrosinephosphorylation sites for kinases such as protein kinase C, cAMP- andcGMP-dependent protein kinases, casein kinase II, and tyrosine kinases.The ColoUp1 protein shares limited sequence homology to a humantransmembrane protein 2 (See Scott et al. 2000 Gene 246:265-74). A mouseColoUp1 homolog is identified in existing GenBank databases and islinked with mesoderm development (see Wines et al. 2001 Genomics. 88-98;GenBank entry AAG41062, AY007815 for the 1179 by nucleic, acid sequenceentry, with 363/390 (93%) identities with human ColoUp1).

As demonstrated herein, ColoUp1 is secreted from both the basolateraland apical surfaces of intestinal cells.

ColoUp2:

The ColoUp2 nucleic acid sequence encodes a full-length protein of 755amino acids. The application also discloses certain polymorphisms thathave been observed, for example at nucleotide 113 GCC→ACC (Ala-Thr); nt480 GAA→GGA (Glu-Gly); and at nt 2220 CAG→CGG (Gln-Arg). The sequence ofColoUp2 protein is similar to that of alpha 3 type VI collagen, isoform2 precursor. In addition, a few domains are identified in the ColoUp2protein such as a von Willebrand factor type A domain (vWF) and anEGF-like domain. The vWF domain is found in various plasma proteins suchas some complement factors, the integrins, certain collagen, and otherextracellular proteins. Proteins with vWF domains participate innumerous biological events which involve interaction with a large arrayof ligands, for example, cell adhesion, migration, homing, patternformation, and signal transduction. The EGF-like domain consisting ofabout 30-40 amino acid residues has been found many proteins. Thefunctional significance of EGF domains is not yet clear. However, acommon feature is that these EGF-like repeats are found in theextracellular domain of membrane-bound proteins or in proteins known tobe secreted.

As demonstrated herein, ColoUp2 is secreted from both the apical andbasolateral surfaces of intestinal cells, and can be found in the bloodin two different Rums, a full-length secreted form and a C-terminalfragment (approximately 55 kDa).

Osteopontin:

The Osteopontin nucleic acid sequence encodes a full-length protein of300 amino acids. Osteopontin is an acidic glycoprotein and is producedprimarily by osteoclasts, macrophages, T-cells, kidneys, and vascularsmooth muscle cells. As a cytokine, Osteopontin is known to contributesubstantially to metastasis formation by various cancers. In addition,it contributes to macrophage homing and cellular immunity, mediatesneovascularization, inhibits apoptosis, and maintains the homeostasis offree calcium (see a review, Weber G F. 2001 Biochim Biophys Acta.1552:61-85).

ColoUp3:

The ColoUp3 nucleic acid sequence encodes a full-length protein of 829amino acids. ColoUp3 is referred to in the literature as P-cadherin forcadherin 3, type 1). P-cadherin belongs to a cadherin family thatincludes E-cadherin and N-cadherin. P-cadherin is expressed in placentaand stratified squamous epithelia (see Shimoyama et al. 1989 J CellBiol. 109:1787-94), but not in normal colon. P-cadherin null micedevelop mammary gland hyperplasia, dysplasia, and abnormal lymphoidinfiltration (see Radice et al. 1997 J Cell Biol. 139:1025-32),demonstrating that loss of normal P-cadherin expression leads tocellular and, glandular abnormalities. It has been shown that P-cadherinis aberrantly expressed in inflamed and dysplastic colitic mucosa, withconcomitant E-cadherin downregulation. Recently, aberrant P-cadherinexpression is found as an early event in hyperplastic and dysplastictransformation in the colon (see Hardy et al. 2002 Gut, 50:513-514).

ColoUp4:

The ColoUp4 nucleic acid sequence encodes a full-length protein of 694amino acids. ColoUp4 is referred to in the literature as NF-E2 relatedfactor 3 (NRF3). NRF3 was identified and characterized as a novel Cap‘n’collar (CNC) factor, with a basic region-leucine zipper domain highlyhomologous to those of other CNC proteins such as NRF1 and NRF2. TheseCNC factors bind to Maf recognition elements (MARE) through heterodimerformation with small Maf proteins in vitro and in vivo analyses showedthat NRF3 can heterodimerize with MafK and that this complex binds tothe MARE in the chicken β-globin enhancer and can activatetranscription. NRF3 mRNA is highly expressed in human placenta and Bcell and monocyte lineage. (see Kobayashi et al. 1999 J Biol Chem.274:6443-52).

ColoUp5:

The ColoUp5 nucleic acid sequence encodes a full-length protein of 402amino acids. ColoUp5 is referred to in the literature as FoxQ1 (Forkheadbox, subclass q, member 1, formerly known as HFH-1). FoxQ1 is a memberof the evolutionarily conserved winged helix/forkhead transcriptionfactor gene family. The hallmark of this family is a conserved DNAbinding region of approximately 110 amino acids (FOX domain). Members ofthe FOX gene family are found in a broad range of organisms from yeastto human. Human FoxQ1 gene is expressed in different tissues such asstomach, trachea, bladder, and salivary gland. FoxQ1 gene playsimportant roles in tissue-specific gene regulation and development, forexample, embryonic development, cell cycle regulation, cell signaling,and tumorigenesis. The FoxQ1 gene is located on chromosome 6p23-25.Sequence analysis indicates that human FoxQ1 shows 82% homology with themouse Foxq1 gene (formerly Hfh-1L) and with a revised sequence of therat FoxQ1 gene (formerly Hfh-1). Mouse FoxQ1 was shown to regulatedifferentiation of hair in Satin mice. The DNA-binding motif (i.e., theFOX domain) is well conserved, showing 100% identity in human, mouse,and rat. The human FoxQ1 protein sequence contains two putativetranscriptional activation domains, which share a high amino acididentity with the corresponding mouse and rat domains (see Bieller etal, 2001 DNA Cell Biol. 20:555-61).

ColoUp6:

The ColoUp6 nucleic acid sequence encodes a full-length protein of 209amino acids. The ColoUp6 protein is 99% identical to the C-terminalportion of keratin 23 (or cytokeratin 23, or the type I intermediatefilament cytokeratin), and accordingly the term ColoUp6 includes boththe 209 amino acid protein (and related nucleic acids, fragments,variants, etc.) and the cytokeratin 23 amino acid sequence of GenBankentry BAA92054.1 (and related nucleic acids, fragments, variants, etc.).Keratin 23 mRNA was found highly induced in different pancreatic cancercell lines in response to sodium butyrate. The keratin 23 protein has422 amino acids, and has an intermediate filament signature sequence andextensive homology to type I keratins. It is suggested that keratin 23is a novel member of the acidic keratin family that is induced inpancreatic cancer cells undergoing differentiation by a mechanisminvolving histone hyperacetylation (See Zhang et al. 2001 GenesChromosomes Cancer. 30:123-35).

ColoUp7:

The ColoUp7 nucleic acid sequence is an EST sequence. No informationrelating to the function of the ColoUp7 gene is identified.

ColoUp8:

The ColoUp8 nucleic acid sequence encodes a full-length protein of 278amino acids. No function has been suggested relating to the ColoUp8gene.

Accordingly, in certain embodiments, the application provides isolated,purified or recombinant ColoUp1, ColoUp2, ColoUp3, ColoUp4, ColoUp5,ColoUp6, ColoUp7, ColoUp8 and osteopontin nucleic acids. In certainembodiments, such nucleic acids may encode a complete or partial ColoUppolypeptide or such nucleic acids may also be probes or primers usefulfor methods involving detection or amplification of ColoUp nucleicacids. In certain embodiments, a ColoUp nucleic acid is single-strandedor double-stranded and composed of natural nucleic acids, nucleotideanalogs, or mixtures thereof. In certain embodiments, the applicationprovides isolated, purified or recombinant nucleic acids comprising anucleic acid sequence that is at least 90% identical to a nucleic acidsequence of any of SEQ ID Nos: 3-12, or a complement thereof, andoptionally at least 95%, 97%, 98%, 99%, 99.3%, 99.5%, 99.7% or 100%identical to a nucleic acid of any of SEQ ID Nos: 3-12, or a complementthereof. In certain preferred embodiments, the application provides aisolated, purified or recombinant nucleic acids comprising a nucleicacid sequence that is at least 90%, 95%, 97%, 98%, 99%, 99.3%, 99.5%,99.7% or 100% identical to a nucleic acid of any of SEQ ID Nos: 3-12, ora complement thereof. In certain embodiments, the application providesisolated, purified, or recombinant nucieic acids comprising a nucleicacid sequence that encodes a polypeptide that is at least 90% identicalto an amino acid sequence of any of SEQ ID Nos: 1-3 or 13-21, or acomplement thereof, and optionally at least 95%, 97%, 98%, 99%, 99.3%,99.5%, 99.7% or 100% identical to an amino acid sequence of any of SEQID Nos: 1-3 or 13-21, or a complement thereof. In certain preferredembodiments, the application provides isolated, purified or recombinantnucleic acids comprising a nucleic acid sequence that encodes apolypeptide that is at least 90% identical to an amino acid sequence ofany of SEQ ID Nos: 3, 14 or 21, or a complement thereof, and optionallyat least 95%, 97%, 98%, 99%, 99.3%, 99.5%, 99.7% or 100% identical to anamino acid sequence of any of SEQ IP Nos: 3, 14 or 21, or a complementthereof.

In further embodiments, the application provides expression constructs,vectors and cells comprising a ColoUp nucleic acid. Expressionconstructs are nucleic acid constructs that are designed to permitexpression of an expressible nucleic acid (e.g. a ColoUp nucleic acid)in a suitable cell type or in vitro expression system. A variety ofexpression construct systems are, in view of this specification, wellknown in the art, and such systems generally include a promoter that isoperably linked to the expressible nucleic acid. The promoter may be aconstitutive promoter, as in the case of many viral promoters, or thepromoter may be a conditional promoter, as in the case of theprokaryotic lad-repressible, IPTG-inducible promoter and as in the caseof the eukaryotic tetracycline-inducible promoter. Vectors refer to anynucleic acid that is capable of transporting another nucleic acid towhich it has been linked between different cells or viruses. One type ofvector is an episome, i.e., a nucleic acid capable of extra-chromosomalreplication, such as a plasmid. Episome-type vectors typically carry anorigin of replication that directs replication of the vector in a hostcell. Another type of vector is an integrative vector that is designedto recombine with the genetic material of a host cell. Vectors may beboth autonomously replicating and integrative, and the properties of avector may differ depending on the cellular context (i.e. a vector maybe autonomously replicating in one host cell type and purely integrativein another host cell type). Vectors capable of directing the expressionof genes to which they are operatively linked are referred to herein as“expression vectors”. Vectors that carry an expression construct aregenerally expression vectors. Vectors have been designed for a varietyof cell types. For example, in the bacterium E. coli, commonly usedvectors include pUC plasmids, pBR322 plasmids, pBiueScript and M13plasmids. In insect cells (e.g. SF-9, SF-21 and High-Five cells),commonly used vectors include BacPak6 (Clontech) and BaculoGold(Pharmingen) (both Clontech and Pharmingen are divisions of Becton,Dickinson and Co., Franklin Lakes, N.J.). In mammalian cells (e.g.Chinese hamster ovary (CHO) cells, Vaco cells and human embryonic kidney(HEK) cells), commonly used vectors include pCMV vectors (Stratagene,Inc., La Jolla, Calif.), and pRK vectors. In certain embodiments, theapplication provides cells that comprise a ColoUp nucleic acid,particularly a recombinant ColoUp nucleic acid, such as an expressionconstruct or vector that comprises a ColoUp nucleic acid. Cells may beeukaryotic or prolaryotic, depending on the anticipated use. Prokaryoticcells, especially E. coli, are particularly useful for storing andreplicating nucleic adds, particularly nucleic acids carried on plasmidor viral vectors. Bacterial cells are also particularly useful forexpressing nucleic acids to produce large quantities of recombinantprotein, but bacterial cells do not usually mimic eukaryoticpost-translational modifications, such as glycosylations orlipid-modifications, and so will tend to be less suitable for productionof proteins in which the post-translational modification state issignificant. Eukaryotic cells, and especially cell types such as insectcells that work with baculovirus-based protein expression systems, andChinese hamster ovary cells, are good systems for expressing eukaryoticproteins that have significant post-translational modifications.Eukaryotic cells are also useful for studying various aspects of thefunction of eukaryotic proteins. For example, colon cancer cell linesare good model systems for studying the role of ColoUp genes andproteins in colon cancers.

In certain aspects the application further provides methods forpreparing ColoUp polypeptides. In general, such methods compriseobtaining a cell that comprises a nucleic acid encoding a ColoUppolypeptide, and culturing the cell under conditions that causeproduction of the ColoUp polypeptide. Polypeptides produced in thismanner may be obtained from the appropriate cell or culture fraction.For example, secreted proteins are most readily obtained from theculture supernatant, soluble intracellular proteins are most readilyobtained from the soluble fraction of a cell lysate, and membraneproteins are most readily obtained from a membrane fraction. However,proteins of each type can generally be found in all three types of cellor culture fraction. Crude cellular or culture fractions may besubjected to further purification procedures to obtain substantiallypurified ColoUp polypeptides. Common purification procedures includeaffinity purification (e.g. with hexahistidine-tagged polypeptides), ionexchange chromatography, reverse phase chromatography, gel filtrationchromatography, etc.

In certain aspects the application provides recombinant, isolated,substantially purified or purified ColoUp1, ColoUp2, ColoUp3, ColoUp4,ColoUp5, ColoUp6, ColoUP7, ColoUp8 and osteopontin polypeptides. Incertain embodiments, such polypeptides may encode a complete or partialColoUp polypeptide. In certain embodiments, a ColoUp polypeptide iscomposed of natural amino acids, amino acid analogs, or mixturesthereof. ColoUp polypeptides may also include one or morepost-translational modifications, such as glycosylation,phosphorylation, lipid modification, acetylation, etc. In certainembodiments, the application provides isolated, substantially purified,purified or recombinant polypeptides comprising an amino acid sequencethat is at least 90% identical to an amino acid sequence of any of SEQID Nos: 1-3 or 13-21 and optionally at least 95%, 97%, 98%, 99%, 99.3%,99.5% or 99.7% identical to a nucleic acid of any of SEQ ID Nos: 1-3 or13-21. In certain preferred embodiments, the application provides aisolated, substantially purified, purified or recombinant polypeptidecomprising an amino acid sequence that is at least 90%, 95%, 97%, 98%,99%, 993%, 99.5% or 99.7% identical to a nucleic acid of any of SEQ IDNos: 3, 14 or 21. In certain preferred embodiments, the applicationprovides an isolated, subtstantially purified, purified or recombinantpolypeptide comprising an amino acid sequence that differs from SEQ IDNos. 3, 14 or 21 by no more than 4 amino acid substitutions, additionsor deletions. Optionally, a polypeptide of the invention comprises anadditional moiety, such as an additional polypeptide sequence or otheradded compound, with a particular function, such as an epitope tag thatfacilitates detection of the recombinant polypeptide with an antibody, apurification moiety that facilitates purification (e.g. by affinitypurification), a detection moiety, that facilitates detection of thepolypeptide in vivo or in vitro, or an antigenic moiety that increasesthe antigenicity of the polypeptide so as to facilitate antibodyproduction. Often, a single moiety will provide multiplefunctionalities. For example, an epitope tag will generally also assistin purification, because an antibody that recognizes the epitope can beused in an affinity purification procedure as well. Examples of commonlyused epitope tags are: an HA tag, a hexahistidine tag, a V5 tag, aGlu-Glu tag, a c-myc tag, a VSV-G tag, a FLAG tag, an enterokinasecleavage site tag and a T7 tag. Commonly used purification moietiesinclude: a hexahistidine tag, a glutathione-S-transferase domain, acellulose binding domain and a biotin tag. Commonly used detectionmoieties include fluorescent proteins (e.g. green fluorescent proteins),a biotin tag, and chromogenic/fluorogenic enzymes (e.g.beta-galactosidase and luciferase). Commonly used antigenic moietiesinclude the keyhole limpet hemocyanin and serum albumins. Note thatthese moieties need not be polypeptides and need not be connected to thepolypeptide by a traditional peptide bond.

4. Antibodies and Uses Therefor

Another aspect of the invention pertains to an antibody specificallyreactive with a. ColoUp polypeptide, preferably antibodies that arespecifically reactive with ColoUp polypeptides such as ColoUp1 andColoUp2 polypeptides. For example, by using immunogens derived from aColoUp polypeptide, e.g., based on the cDNA sequences,anti-protein/anti-peptide antisera or monoclonal antibodies can be madeby standard protocols (See, for example, Antibodies: A Laboratory Manualed. by Harlow and Lane (Cold Spring Harbor Press: 1988)). A mammal, suchas a mouse, a hamster or rabbit can be immunized with an immunogenicform of the peptide (e.g., a ColoUp polypeptide or an antigenic fragmentwhich is capable of eliciting an antibody response, or a fusionprotein). Techniques for conferring immunogenicity on a protein orpeptide include conjugation to carriers or other techniques well knownin the art. An immunogenic portion of a ColoUp polypeptide can beadministered in the presence of adjuvant. The progress of immunizationcan be monitored by detection of antibody titers in plasma or serum.Standard ELISA or other immunoassays can be used with the immunogen asantigen to assess the levels of antibodies. In a preferred embodiment,the subject antibodies are immunospecific for antigenic determinants ofa ColoUp polypeptide of a mammal, e.g., antigenic determinants of aprotein set forth in SEQ ID Nos: 1-3 and 13-21, more preferably SEQ IDNos: 1-3 or 21.

In one embodiment, antibodies are specific for the secreted proteins asencoded by nucleic acid sequences as set forth in SEQ ID Nos: 4-5. Inanother embodiment, the antibodies are immunoreactive with one or moreproteins having an amino acid sequence that is at least 80% identical toan amino acid sequence as set forth in SEQ ID Nos: 1-3 and 13-21,preferably SEQ ID Nos: 1-3 or 21. In other embodiments, an antibody isimmunoreactive with one or more proteins having an amino acid sequencethat is at least 85%, 90%, 95%, 98%, 99%, 99.3%, 99.5%, 99.7% identicalor 100% identical to an amino acid sequence as set forth in SEQ ID Nos:1-3 and 13-21. More preferably, the antibody is immunoreactive with oneor more proteins having an amino acid sequence that is at least 85%,90%, 95%, 98%, 99%, 99.3%, 99.5%, 99.7% or identical to an amino acidsequence as set forth in SEQ ID NOs: 1-3 or 2.1. In certain preferredembodiments, the invention provides an antibody that binds to an epitopeincluding the C-terminal portion of the polypeptide of SEQ ID Nos: 3, 14or 21. In certain preferred embodiments, the invention provides anantibody that binds to an epitope of a ColoUp2 polypeptide that isprevalent in the blood of an animal having a colon neoplasia, such SEQID No: 3 or 21.

Following immunization of an animal with an antigenic preparation of aColoUp polypeptide, anti-ColoUp antisera can be obtained and, ifdesired, polyclonal anti-ColoUp antibodies can be isolated from theserum. To produce monoclonal antibodies, antibody-producing cells(lymphocytes) can be harvested from an immunized animal and fused bystandard somatic cell fusion procedures with immortalizing cells such asmyeloma cells to yield hybridoma cells. Such techniques are well knownin the art, and include, for example, the hybridoma technique(originally developed by Kohler and Milstein., (1975) Nature, 256:495-497), the human B cell hybridoma technique (Kozbar et al., (1983)Immunology Today, 4: 72), and the EBV-hybridoma technique to producehuman monoclonal antibodies (Cole et al., (1985) Monoclonal Antibodiesand Cancer Therapy, Alan R. Liss, Inc. pp. 77-96). Hybridoma cells canbe screened immunochemically for production of antibodies specificallyreactive with a mammalian ColoUp polypeptide of the present inventionand monoclonal antibodies isolated from a culture comprising suchhybridoma cells. In one embodiment anti-human ColoUp antibodiesspecifically react with the protein encoded by a nucleic acid having SEQNos: 4-12; more preferably the antibodies specifically react with theprotein encoded by a nucleic acid having SEQ ID Nos: 4 or 5, andpreferably a secreted protein that is produced by the expression of anucleic acid having a sequence of SEQ ID Nos: 4 or 5.

The term antibody as used herein is intended to include fragmentsthereof which are also specifically reactive with one of the subjectColoUp polypeptides. Antibodies can be fragmented using conventionaltechniques and the fragments screened for utility in the same manner asdescribed above for whole antibodies. For example, F(ab)₂ fragments canbe generated by treating antibody with pepsin. The resulting F(ab)₂fragment can be treated to reduce disulfide bridges to produce Fabfragments. The antibody of the present invention is further intended toinclude bispecific, single-chain, and chimeric and humanized moleculeshaving affinity for a ColoUp polypeptide conferred by at least one CDRregion of the antibody. In preferred embodiments, the antibodies, theantibody further comprises a label attached thereto and able to bedetected, (e.g., the label can be a radioisotope, fluorescent compound,enzyme or enzyme co-factor).

In certain preferred embodiments, an antibody of the invention is amonoclonal antibody, and in certain embodiments the invention makesavailable methods for generating novel antibodies. For example, a methodfor generating a monoclonal antibody that binds specifically to a ColoUppolypeptide, such as a ColoUp2 polypeptide may comprise administering toa mouse an amount of an immunogenic composition comprising the ColoUp2polypeptide effective to stimulate a detectable immune response,obtaining antibody-producing cells (e.g. cells from the spleen) from themouse and fusing the antibody-producing cells with myeloma cells toobtain antibody-producing hybridomas, and testing the antibody-producinghybridomas to identify a hybridoma that produces a monocolonal antibodythat binds specifically to the ColoUp2 polypeptide. Once obtained, ahybridoma can be propagated in a cell culture, optionally in cultureconditions where the hybridoma-derived cells produce the monoclonalantibody that binds specifically to the ColoUp2 polypeptide. Themonoclonal antibody may be purified from the cell culture.

Anti-ColoUp antibodies can be used, e.g., to detect ColoUp polypeptidesin biological samples and/or to monitor ColoUp polypeptide levels in anindividual, for determining whether or not said patient is likely todevelop colon cancer or is more likely to harbor colon adenomas, orallowing determination of the efficacy of a given treatment regimen foran individual afflicted with colon neoplasia, colon cancer, metastaticcolon cancer and colon adenomas. The level of ColoUp polypeptide may bemeasured in a variety of sample types such as, for example, in cells,stools, and/or in bodily fluid, such as in whole blood samples, bloodserum, blood, plasma and urine. The adjective “specifically reactivewith” as used in reference to an antibody is intended to mean, as isgenerally understood in the art, that the antibody is sufficientlyselective between the antigen of interest (e.g. a ColoUp polypeptide)and other antigens that are not of interest that the antibody is usefulfor, at minimum, detecting the presence of the antigen of interest in aparticular type of biological sample. In certain methods employing theantibody, a higher degree of specificity in binding may be desirable.For example, an antibody for use in detecting a low abundance protein ofinterest in the presence of one or more very high abundance protein thatare not of interest may perform better if it has a higher degree ofselectivity between the antigen of interest and other cross-reactants.Monoclonal antibodies generally have a greater tendency (as compared topolyclonal antibodies) to discriminate effectively between the desiredantigens and cross-reacting polypeptides. In addition, an antibody thatis effective at selectively identifying an antigen of interest in onetype of biological sample (e.g. a stool sample) may not be as effectivefor selectively identifying the same antigen in a different type ofbiological sample (e.g. a blood sample). Likewise, an antibody that iseffective at identifying an antigen of interest in a purified proteinpreparation that is devoid of other biological contaminants may not beas effective at identifying an antigen of interest in a crude biologicalsample, such as a blood or urine sample. Accordingly, in preferredembodiments, the application provides antibodies that have demonstratedspecificity for an antigen of interest (particularly, although notlimited to, a ColoUp1 or ColoUp2 polypeptide) in a sample type that islikely to be the sample type of choice for use of the antibody. In aparticularly preferred embodiment, the application provides antibodiesthat bind specifically to a ColoUp1 or ColoUp2 polypeptide in a proteinpreparation from blood (optionally serum or plasma) from a patient thathas a colon neoplasia or that bind specifically in a crude blood sample(optionally a crude serum or plasma sample).

One characteristic that influences the specificity of an antibody:antigen interaction is the affinity of the antibody for the antigen.Although the desired specificity may be reached with a range ofdifferent affinities, generally preferred antibodies will have anaffinity (a dissociation constant) of about 10⁻⁶, 10⁻⁷, 10⁻⁸, 10⁻⁹ orless.

In addition, the techniques used to screen antibodies in order toidentify a desirable antibody may influence the properties of theantibody obtained. For example, an antibody to be used for certaintherapeutic purposes will preferably be able to target a particular celltype. Accordingly, to obtain antibodies of this type, it may bedesirable to screen for antibodies that bind, to cells that express theantigen of interest (e.g. by fluorescence activated cell sorting).Likewise, if an antibody is to be used for binding an antigen insolution, it may be desirable to test solution binding. A variety ofdifferent techniques are available for testing andbody:antigeninteractions to identify particularly desirable antibodies. Suchtechniques include ELISAs, surface plasmon resonance binding assays(e.g. the Biacore binding assay, Biacore AB, Uppsala., Sweden), sandwichassays (e.g. the paramagnetic bead system of IGEN International, Inc.;Gaithersburg, Md.), western blots, immunoprecipitation assays andimmunohistochemistry.

Another application of anti-ColoUp antibodies of the present inventionis in the immunological screening of cDNA libraries constructed inexpression vectors such as gt11, gt18-23, ZAP, and ORF8. Messengerlibraries of this type, having coding sequences inserted in the correctreading frame and orientation, can produce fusion proteins. Forinstance, gal will produce fusion proteins whose amino termini consistof β-galactosidase amino acid sequences and whose carboxy terminiconsist of a foreign polypeptide. Antigenic epitopes of a ColoUppolypeptide, e.g., other orthologs of a particular protein or otherparalogs from the same species, can then be detected with antibodies,as, for example, reacting nitrocellulose filters lifted from infectedplates with the appropriate anti-ColoUp antibodies. Positive phagedetected by this assay can then be isolated from the infected plate.Thus, the presence of ColoUp homologs can be detected and cloned, fromother animals, as can alternate isoforms (including splice variants)from humans.

5. Methods for Detecting Molecular Markers in a Patient

In certain embodiments, the invention provides methods for detectingmolecular markers, such as proteins or nucleic acid transcripts of theColoUp markers described herein. In certain embodiments, a method of theinvention comprises providing a biological sample and probing thebiological sample, for the presence of a ColoUp marker. Informationregarding the presence or absence of the ColoUp marker, and optionallythe quantitative level of the ColoUp marker, may then be used to drawinferences about the nature of the biological sample and, if thebiological sample was obtained from a subject, the health state of thesubject.

Samples for use with the methods described herein may be essentially anybiological material of interest. For example, a sample may be a tissuesample from a subject, a fluid sample from a subject, a solid orsemi-solid sample from a subject, a primary cell culture or tissueculture of materials derived from a subject, cells from a cell line, ormedium or other extracellular material from a cell or tissue culture, ora xenograft (meaning a sample of a colon cancer from a first subject,e.g. a human, that has been cultured in a second subject, e.g. animmunocompromised mouse). The term “sample” as used herein is intendedto encompass both a biological material obtained directly from a subject(which may be described as the primary sample) as well as anymanipulated forms or portions of a primary sample. For example, incertain embodiments, a preferred fluid sample is a blood sample. In thiscase, the term sample is intended to encompass not only the blood asobtained directly from the patient but also fractions of the blood, suchas plasma, serum, cell fractions (e.g. platelets, erythrocytes,lymphocytes), protein preparations, nucleic acid preparations, etc. Asample may also be obtained by contacting a biological material with anexogenous liquid, resulting in the production of a lavage liquidcontaining some portion of the contacted biological material.Furthermore, the term “sample” is intended to encompass the primarysample after it has been mixed with one or more additive, such aspreservatives, chelators, anti-clotting factors, etc. In certainembodiments, a fluid sample is a urine sample. In certain embodiments, apreferred solid or semi-solid sample is a stool sample. In certainembodiments, a preferred tissue sample is a biopsy from a tissue knownto harbor or suspected of harboring a colon neoplasia. In certainembodiments, a preferred cell culture sample is a sample comprisingcultured cells of a colon cancer cell line, such as a cell line culturedfrom a metastatic colon cancer tumor or a colon-derived cell linelacking a functional TGF-β, TGF-β receptor or TGF-β signaling pathway. Asubject is preferably a human subject, but it is expected that themolecular markers disclosed herein, and particularly their homologs fromother animals, are of similar utility in other animals. In certainembodiments, it may be possible to detect a marker directly in anorganism without obtaining a separate portion of biological material. Insuch instances, the term sample is intended to encompass that portion ofbiological material that is contacted with a reagent or device involvedin the detection process.

In certain embodiments, a method of the invention comprises detectingthe presence of a ColoUp protein in a sample. Optionally, the methodinvolves obtaining a quantitative measure of the ColoUp protein in thesample. In view of this specification, one of skill in the art willrecognize a wide range of techniques that may be employed to detect andoptionally quantitate the presence of a protein. In preferredembodiments, a ColoUp protein is detected with an antibody. Suitableantibodies are described in a separate section below. In many,embodiments, an antibody-based detection assay involves bringing thesample and the antibody into contact so that the antibody has anopportunity to bind to proteins having the corresponding epitope. Inmany embodiments, an antibody-based detection assay also typicallyinvolves a system for detecting the presence of antibody-epitopecomplexes, thereby achieving a detection of the presence of the proteinshaving the corresponding epitope. Antibodies may be used in a variety ofdetection techniques, including enzyme-linked immunosorbent assays(ELISAs), immunoprecipitations, Western blots. Antibody-independenttechniques for identifying a protein may also be employed. For example,mass spectroscopy, particularly coupled with liquid chromatography,permits detection and quantification of large numbers of proteins in asample. Two-dimensional gel electrophoresis may also be used to identifyproteins, and may be coupled with mass spectroscopy or other detectiontechniques, such as N-terminal protein sequencing. RNA aptamers withspecific binding for the protein of interest may also be generated andused as a detection reagent.

In certain preferred embodiments, methods of the invention involvedetection of a secreted form of a ColoUp protein or osteopontin,particularly ColoUp1 protein or ColoUp2 protein.

Samples should generally be prepared in a manner that is consistent withthe detection system to be employed. For example, a sample to be used ina protein detection system should generally be prepared in the absenceof proteases. Likewise, a sample to be used in a nucleic acid detectionsystem should generally be prepared in the absence of nucleases. In manyinstances, a sample for use in an antibody-based detection system willnot be subjected to substantial preparatory steps. For example, urinemay be used directly, as may saliva and blood, although blood will, incertain preferred embodiments, be separated into fractions such asplasma and serum.

In certain embodiments, a method of the invention comprises detectingthe presence of a ColoUp expressed nucleic acid, such as an mRNA, in asample. Optionally, the method involves obtaining a quantitative measureof the ColoUp expressed nucleic acid in the sample. In view of thisspecification, one of skill in the art will recognize a wide range oftechniques that may be employed to detect and optionally quantitate thepresence of a nucleic acid. Nucleic acid detection systems generallyinvolve preparing a purified nucleic acid fraction of a sample, andsubjecting the sample to a direct detection assay or an amplificationprocess followed by a detection assay. Amplification may be achieved,for example, by polymerase chain reaction (PCR), reverse transcriptase(RT) and coupled RT-PCR. Detection of a nucleic acid is generallyaccomplished by probing the purified nucleic acid fraction with a probethat hybridizes to the nucleic acid of interest, and in many instancesdetection involves an amplification as well. Northern blots, dot blots,mcroarrays, quantitative PCR and quantitative RT-PCR are all well knownmethods for detecting a nucleic acid in a sample.

In certain embodiments, the invention provides nucleic acid probes thatbind specifically to a ColoUp nucleic acid. Such probes may be labeledwith, for example, a fluorescent moiety, a radionuclide, an enzyme or anaffinity tag such as a biotin moiety. For example, the TaqMan® systememploys nucleic acid probes that are labeled in such a way that thefluorescent signal is quenched when the probe is free in solution andbright when the probe is incorporated into a larger nucleic acid.

In certain embodiments, the application provides methods for imaging acolon neoplasm by targeting antibodies to any one of the markers ColoUp1through ColoUp8 or osetopontin described herein, more preferably theantibodies are targeted to ColoUp3. The markers described herein may betargeted using monoclonal antibodies which may be labeled withradioisotopes for clinical imaging of tumors or with toxic agents todestroy them.

In other embodiments, the application provides methods for administeringa imaging agent comprising a targeting moiety and an active moiety. Thetargeting moiety may be an antibody, Fab, F(Ab)2, a single chainantibody or other binding agent that interacts with an epitope specifiedby a polypeptide sequence having an amino acid sequence as set forth inSEQ ID Nos: 1-3 and 13-21, preferably an epitope specified by SEQ ID No:16. The active moiety may be a radioactive agent, such as: radioactiveheavy metals such as iron chelates, radioactive chelates of gadoliniumor manganese, positron emitters of oxygen, nitrogen, iron, carbon, orgallium, ⁴³K, ⁵²Fe, ⁵⁷Co, ⁶⁷Cu, ⁶⁷Cu, ⁶⁸Ga, ¹²³I, ¹²⁵I, ¹³¹I, ¹³²I, or⁹⁹Tc. The imaging agent is administered in an amount effective fordiagnostic use in a mammal such as a human and the localization andaccumulation of the imaging agent is then detected. The localization andaccumulation of the imaging agent may be detected by radioscintigraphy,nuclear magnetic resonance imaging, computed tomography or positronemission tomography.

Immunoscintigraphy using monoclonal antibodies directed at the ColoUpmarkers may be used to detect and/or diagnose colon neoplasia. Forexample, monoclonal antibodies against the ColoUp marker such as ColoUp3labeled with ⁹⁹Technetium, ¹¹¹Indium, ¹²⁵Iodine-may be effectively usedfor such imaging. As will be evident to the skilled artisan, the amountof radioisotope to be administered is dependent upon the radioisotope.Those having ordinary skill in the art can readily formulate the amountof the imaging agent to be administered based upon the specific activityand energy of a given radionuclide used as the active moiety. Typically0.1-100 millicuries per dose of imaging agent, preferably 1-10millicuries, most often 2-5 millicuries are administered. Thus,compositions according to the present invention useful as imaging agentscomprising a targeting moiety conjugated to a radioactive moietycomprise 0.1-100 millicuries, in some embodiments preferably 1-10millicuries, in some embodiments preferably 2-5 millicuries, in someembodiments more preferably 1-5 millicuries.

EXEMPLIFICATION

The invention now being generally described, it will be more readilyunderstood by reference to the following examples, which are includedmerely for purposes of illustration of certain aspects and embodimentsof the present invention, and are not intended to limit the invention.

Example 1 Selection of Eight Molecular Markers for Colon Neoplasia

Expression micro-array profiling was used to find genes whose expressionwas different between normal colon and metastatic colon cancer. Normalcolon and metastatic colon cancer samples were analyzed for geneexpression using DNA expression microarray techniques that profiledexpression patterns of nearly 50,000 genes, ESTs and predicted exons.Analysis of the data identified eight molecular markers for colonneoplasia, as shown in Table 2.

TABLE 2 Eight Selected Molecular Markers for Colon Neoplasia (Median Met(Median Met Example (Median Liver (Median Liver (Minimum Liver CellLines)/ Xenografts)/ Marker Sequences Mets)/(Median Mets)/(MedianMets)/(Maximum (Median Median Name (SEQ ID Nos.) Normal Colon) NormalLiver) Normal Colon) Normal Colon) Normal Colon) ColoUp1 1, 2, 4, 1313.94 13.94 0.26 14.08 15.48 ColoUp2 3, 5, 14 5.70 5.70 1.00 5.32 1.24ColoUp3 7, 16 16.36 16.36 0.80 21.50 15.68 ColoUp4 8, 17 4.68 4.68 1.004.88 1.56 ColoUp5 9, 18 4.58 4.74 1.15 4.82 4.63 ColoUp6 10, 19  9.529.52 0.52 11.58 1.92 ColoUp7 11 9.20 9.20 0.18 4.30 9.00 ColoUp8 12, 20 4.78 4.78 1.27 3.76 2.72

Osteopontin Was also identified as a molecular marker having similarcharacteristics (Example sequences SEQ ID Nos: 6, 15). Each of thesemolecular markers was subjected to additional analysis in various typesof colon neoplasia. In the case of ColoUp1 and ColoUp2, the microarrayexpression was confirmed by Northern blot and secretion of the proteinwas established.

Example 2 Expression Pattern of ColoUp1 in Various Cell Types

Shown in FIG. 20 is a graphical display of ColoUp1 expression levelsmeasured for different tissue samples. ColoUp1 transcript wasessentially undetectable (AI expression levels less than 0) in normalcolon epithelial strips (labeled colon epithelial), in normal liver andin colonic muscle (labeled c. muscle). In contrast ColoUp1 expressionwas clearly detected in premalignant colon adenomas as well as in 90% ofDukes stage B (early node negative colon cancers), Dukes stage C (nodepositive colon cancer), Dukes stage D (primary colon cancers withassociated metastatic spread) and in colon cancer liver metastasis(labeled liver metastasis). ColoUp1 expression was also demonstrated incolon cancer cell lines (labeled colon cell lines) and in colon cancerxenografts grown in athymic mice (labeled xenografts). The expression incell lines and xenografts confirms that colon neoplasia cells are thesource of ColoUp1 expression in the tumors.

The probe for ColoUp 1 was designed to recognize transcriptscorresponding to gene MAA1199, Genbank entry AB033025, Unigene entryHs.50081. A transcript corresponding to this gene was amplified byRT-PCR from colon cancer cell line Vaco-394. The sequence of thistranscript is presented in FIG. 3.

Example 3 Confirmed Gene Expression Pattern of ColoUp1

FIG. 29 shows a northern analysis using the cloned ColoUp1 cDNA thatidentifies a transcript running above the large ribosomal subunit (towhich the probe cross hybridizes) that is not expressed in normal colontissue samples and is ubiquitously expressed in a group of colon cancercell lines.

FIGS. 29B and 29C show the results of northern analysis of ColoUp1 innormal colon tissue and colon neoplasms from 15 individuals with coloncancers and one individual with a colon adenoma. No normal colon sampleexpresses ColoUp1. However, expression is see in 13 of 15 colon cancers,and in the one colon adenoma. Expression is seen in cancers arising inboth the right and left colon, and in cancers of Dukes Stage B2, C andD.

Example 4 ColoUp1 is a Secreted Protein

The cloned ColoUp1 colonic transcript was inserted into a cDNAexpression vector with a C-terminal T7 epitope tag. FIG. 30A shows asummary of the behavior of the tagged protein expressed by transfectionof the vector into Vaco400 cells. An anti. T7 western blot showsexpression of the transfected tagged protein detected in the lysate of apellet of transfected cells (lane T of cell pellet) which is absent incells transfected with a control empty expression vector (lane C of cellpellet). Moreover, serial immunoprecipitation and western blotting of T7tagged protein from media in which V400 cells were growing (which hadbeen clarified by centrifugation prior to immunoprecipatation) alsoclearly demonstrates secretion of ColoUp1 protein into the growthmedium.

FIG. 30B shows the full gels demonstrating expression of tagged 409041protein in V400 cells demonstrated by western analysis at left and showsdetection of secreted 409041 protein in growth media as detected atright by serial immunoprecipitation and western analysis. (Antibody fromthe high level of serum in which FET cells are grown blocked the abilityof staphA conjugated beads to precipitate anti-T7 bound to 409041 ingrowth media from FET cells).

Example 5 Expression Pattern of ColoUp2 in Various Cell Types

Shown in FIG. 21 is the graphical display of ColoUp2 expression levelsmeasured for different samples analyzed. ColoUp2 transcript wasessentially undetectable (AI expression levels less than 0) in normalcolon epithelial strips (labeled colon epithelial), in normal liver andin colonic muscle (labeled c. muscle). In contrast ColoUp2 expressionwas clearly detected in premalignant colon adenomas as well as in 90% ofDukes stage B (early node negative colon cancers). Dukes stage C (nodepositive colon cancer), Dukes stage D (primary colon cancers withassociated metastatic spread) and in colon cancer liver metastasis(labeled liver metastasis). ColoUp2 expression was also demonstrated incolon cancer cell lines (labeled colon cell lines) and in colon cancerxenografts grown in athymic mice (labeled xenografts). The expression incell lines and xenografts confirms that colon neoplasia cells are thesource of ColoUp2 expression in the tumors.

Probe ColoUp2 was designed to recognize transcripts corresponding to anoncoding EST, Genbank entry AI357412, Unigene entry Hs.157601. By 5′RACE, database assembly, and ultimately RT-PCR, we cloned from a coloncancer cell line a novel protein encoding RNA transcript whose noncoding3′ UTR was shown to correspond to the ColoUp2 specified EST. This fulllength coding sequence was determined by RT-PCR amplification from coloncancer cell line Vaco503 and sequences are provided in FIG. 4.

ColoUp2 is a “class identifier” (that is, it is higher in all coloncancer samples than in all normal colon samples), it is not-expressed innormal body tissues and it contains a signal sequence predicting thatthe protein product will be secreted (as well as several otherrecognizable protein motifs including domains from the epidermal growthfactor protein and from the Von Willebrands protein).

Example 6 Confirmed Gene Expression Pattern of ColoUp2

FIG. 31 shows a northern analysis using the cloned ColoUp2 cDNA thatidentifies a transcript running above the large ribosomal subunit (towhich the probe cross hybridizes) that is not expressed in normal colontissue samples and is expressed in the majority of group of colon cancercell lines. Panel A of the figure shows the northern hybridization. Thered arrow designates the ColoUp2 transcript. Above each lane is the nameof the sample and the level (in parenthesis) of ColoUp2 expressionrecorded. The black arrow designates the cross hybridizing ribosomallarge subunit. Panel B shows the eithidum bromide stained gelcorresponding to the blot, and the black arrows designate the large andsmall ribosomal subunits.

Example 7 ColoUp2 is a Secreted Protein

The cloned ColoUp2 colonic transcript was inserted into a cDNAexpression vector with a C-terminal V5 epitope tag. FIG. 32 shows asummary of the behavior of the tagged protein expressed by transfectionof the vector into SW480 and Vaco400 cells. An anti V5 western blotshows (red arrows) expression of the transfected tagged protein detectedin the lysate of a pellet of transfected cells (lysates western panel,lanes labeled ColoUp2/V5) which is absent in cells transfected with acontrol empty expression vector (lanes labeled pcDNA3.1). Moreover,serial immunoprecipitation and western blotting of V5 tagged proteinfrom media in which V400 and SW480 cells were growing (which had beenclarified by centrifugation prior to immunoprecipatation) also clearlydemonstrates secretion of the ColoUp2 protein into the growth medium(panel labeled medium IP-western). Antibody bands from theimmunoprecipitation are also present on the IP-western blot. Detectionof secreted ColoUp2 protein was shown in cells assayed both 24 hours and48 hours after transfection.

Example 8 Expression Pattern of ColoUp3-ColoUp8 and Osteopontin inVarious Cell Types

Shown in FIGS. 22-28 are the graphical displays of ColoUp3-ColoUp8 andosteopontin expression levels measured for different samples analyzed.

Example 9 Confirmed Gene Expression Pattern of ColoUp5

Shown in FIG. 33 is a northern blot showing that ColoUp5 is expressed incolon cancer cell lines and not expressed in non-neoplastic material.FIG. 33 shows two northern blot analysis of ColoUp5 mRNA levels innormal colon tissues and a group of colon cancer cell lines (toppanels). The bottom panels show the ethidium bromide stained, gelcorresponding to the blot. Homologs for ColoUp5 are found in othermammals, including mouse and rat, and sequence alignments are shown inFIGS. 34 and 35.

Example 10 Detection of Xenograft Derived ColoUp1 and ColoUp2 ProteinsCirculating in the Blood of Mice

To determine that ColoUp1 and ColoUp2 proteins are effective serologicmarkers of colon neoplasia, we derived transfected cell lines thatstably expressed and secreted. V5-epitope tagged ColoUp1 and ColoUp2proteins. These cells lines were then injected into athymic mice andgrown as tumor xenografts. Mice were sacrificed and serum was obtained.V5 tagged proteins were then precipitated from the serum using beadsconjugated to anti-V5 antibodies. Precipitated serum proteins were runout on SDS-PAGE, and visualized by western blotting using HRP-conjugatedanti-V5 antibodies (thereby eliminating visualization of anycontaminating mouse immunoglobulin). FIG. 36 shows detection ofcirculating ColoUp2 protein in mouse serum. The ColoUp2 protein issecreted as 2 bands of 85 KD and 55 KD in size, of which the 55 KD bandpredominates in the serum. The 55 KD band is presumably a processed formof the 85 KD band. This observation demonstrates that, in this mousemodel, ColoUp2 is indeed a secreted marker of colon cancers andadenomas, and that ColoUp2 can gain access to and circulate stably inpatient serum. This Observation provides the surprising result that aprocessed fragment of ColoUp2 is the predominant serum form of theprotein and therefore detection reagents targeted to this portion wouldbe particularly suitable for diagnostic testing.

A time course experiment showed that ColoUp2 protein was detectable inmouse blood at the earliest time assayed., 1 week after injection ofColoUp2 secreting colon cancer cells, at which time xenograft tumorvolume as only 100 mm³.

Similar observations were also made for ColoUp1, as shown in FIG. 37.

Example 11 Purification of ColoUp1 and ColoUp2 Proteins

In order to develop monoclonal antibodies against native ColoUp1 andColoUp2 proteins, we devised a protocol for purification on Ni-NTAagarose (QIAGEN) nickel beads of recombinant His tagged ColoUp1 andColoUp2 proteins from the media supernate of SW480 cells engineered toexpress these proteins. Currently we have purified both ColoUp1 andColoUp2 proteins to sufficient purity to generate antibodies. As shownin FIG. 38, a Coomassie blue stained gel of purified ColoUp2 shows onlythe 85 KD and 55 KD size bands that correspond to the tagged ColoUp2proteins visualized on western blot. Similarly, a Coomassie blue stainedgel of purified ColoUp1 shows the preparation is highly purified andcomposed of a single 180 KD band that corresponds perfectly to the sizeband seen on western blotting of the epitope tagged ColoUp1 protein.Thus we have purified ColoUp2 and ColoUp1 to sufficient homogeneity andyield. Scaled up purification of these proteins from a 50 liter mediapreparation should yield. 2.5 mg of protein, more than adequate forimmunizing mice and screening fusion supernates for development ofmonoclonal antibodies specific for native ColoUp1 and ColoUp2.

Example 12 Measuring Apical and Basolateral Secretion of ColoUp1 andColoUp2

We expected that ColoUp2 will serve as a serologic marker detection notonly of colon cancers but also of large colon adenomas that also expressColoUp2. Adenomas, unlike colon cancers, are non-invasive. Thus, foradenomas to move ColoUp2 proteins into the circulation they would needto secrete this protein from the basolateral cell surface facingcapillaries and lymphatics, rather than from the apical cell surfacefacing the colon lumen. To determine the polarity of ColoUp2 secretionwe transiently transfected a monolayer of polarized Caco2 colon cancercells with an expression vector for V5-epitope tagged ColoUp2 protein.This cell monolayer was grown in transwell dishes on filters thatseparate an upper transwell chamber (representing media exposed to theapical surface of the monlayer) from a lower transwell chamber(representing media exposed to the basolateral surface of themonolayer). Integrity of the sealing of the monolayer was assayed bymeasuring electrical resistance across the filters, and efficiency oftransient transfection was monitored by expression of a gfp marker.Media from upper and lower chambers was harvested at 24, 48, 72, and 96hours post transfection, and secreted tagged ColoUp2 protein wasdetected by western analysis directed against the V5 epitope tag. AsFIG. 39 shows, characteristic 85 KD and 55 KD secreted forms of ColoUp2were detected in media sampling the basolateral monolayer compartment atall time points assayed. At a single time point, 48 hours, ColoUp2 wasadditionally detected in media representing the apical secretion face;however, a dip in the transfilter electrical resistance at 48 hourssuggests the likelihood of some leaking across the monolayer at thistime point. Certainly, the data clearly shows secretion of ColoUp2 intothe basolateral monolayer compartment, and hence establishes ColoUp2 asdemonstrating the requisite biology for a candidate serologic marker ofcolon adenomas.

As was done for ColoUp2, ColoUp1 expression vectors were used totransiently transfect Caco2 cell monolayers grown on transwell filters.Secretion of ColoUp 1 was then assayed in media collected respectivelyfrom the upper and lower transwell chambers. Western blot assaysdemonstrated equal secretion of ColoUp1 from both apical and basolateralmonolayer surfaces. Studies of ColoUp1 were done in parallel with thoseof ColoUp2, and electrical resistance of the ColoUp1 monolayers exceededthat of the ColoUp2 monolayers, supporting that the ColoUp1 transfectedmonolayers were well sealed. Additionally, levels of secreted ColoUp1protein were similar to those of secreted ColoUp2, suggesting thatColoUp1 secretion by both apical and basolateral compartments was notsimply due to overexpression. Accordingly, we predict that nativeColoUp1 protein is likely secreted at least in part from the basolateralepithelial face, and hence should be detectable as a serologic marker oflarge colon adenomas.

Example 13 Determining the Sequence of the 55 kDa ColoUp2 fragment

The protein sequence of C-terminal fragment of ColoUp2 that is secretedby human cell lines and detected as predominant fragment in blood (488aa) was determined. As described above, we have found on western blotsand on purified preparations of C-terminal epitope tagged (V5-Hisepitope) ColoUp2 protein secreted by transfected human colon cancercells, both a full sized band of approximately 90 kDa and a smallerapproximately 55 kDa C-terminal fragment (as demonstrated by theretention of the C-terminal epitope tag). Moreover, when these cellswere injected, into athymic mice, the 55 kDa C-terminal tagged proteinwas the predominant species detected as circulating in the mouse blood,when mouse serum is analyzed by serial immunoprecipitation and westernblot analysis directed against the V5 tag. The precise location of thecleavage site accounting for the C-terminal fragment was established byexcising the acrylamide gel band containing the purified C-terminalfragment and performing mass spectroscopy analysis of tryptic fragmentsfrom the protein. A peptide of sequence AVLAAHCPFYSWK was present onlyin the digest of the 55 KD fragment, but was absent from the digest ofthe full length protein, demonstrating that this peptide corresponded tothe unique amino terminus of the 55 KD fragment. The complete sequenceof the 55 KD C-terminal fragment is shown in FIG. 41.

INCORPORATION BY REFERENCE

All publications and patents mentioned herein are hereby incorporated byreference in their entirety as if each individual publication or patentwas specifically and individually indicated to be incorporated byreference. In case of conflict, the present application, including anydefinitions herein, will control.

EQUIVALENTS

While specific embodiments of the subject invention have been discussed,the above specification is illustrative and not restrictive. Manyvariations of the invention will become apparent to those skilled in theart upon review of this specification and the claims below. The fullScope of the invention should be determined by reference to the claims,along with their full scope of equivalents, and the specification, alongwith such variations.

1-29. (canceled)
 30. An isolated cell engineered to express arecombinant nucleic acid, wherein said nucleic acid encodes an aminoacid sequence that is at least 90% identical to SEQ ID NOs: 3, 14 or 21.31. The isolated cell of claim 30, wherein the nucleic acid is operablylinked to a promoter.
 32. The isolated cell of claim 31, wherein thepromoter is a constitutive promoter.
 33. The isolated cell of claim 31,wherein said promoter is a conditional promoter.
 34. The isolated cellof claim 33, wherein said conditional promoter is a prokaryoticlad-repressible promoter, an IPTG-inducible promoter or atetracycline-inducible promoter.
 35. The isolated cell of claim 30,wherein said nucleic acid is introduced to the cell by means of avector.
 36. The isolated cell of claim 35, wherein said vector is anepisome-type vector.
 37. The isolated cell of claim 35, wherein saidvector is an integrative vector that is designed to recombine with theendogenous genetic material of a host cell.
 38. The isolated cell ofclaim 36, wherein the episome-type vector is capable of autonomouslyreplicating independent from the endogenous genetic material of thecell.
 39. The isolated cell of claim 37, wherein the nucleic acid iscapable of autonomously replicating independent from the endogenousgenetic material of the cell.
 40. The isolated cell of claim 35, whereinsaid vector is selected from the group consisting of: pUC plasmid,pBR322 plasmid, pBlueScript® plasmid, M13 plasmid, BacPak6™,BaculoGold™, pCMV and/or pRK vectors.
 41. The isolated cell of claim 30,wherein said cell is a bacterium cell.
 42. The isolated cell of claim41, wherein said cell is an E. coli cell.
 43. The isolated cell of claim30, wherein said cell is an insect cell.
 44. The isolated cell of claim43, wherein said cell is an SF-9, SF-21 or High-Five cell.
 45. Theisolated cell of claim 30, wherein said cell is a mammalian cell. 46.The isolated cell of claim 30, wherein said cell is a Chinese HamsterOvary cell, a Human Embryonic Kidney cell, or a cell of the Vaco coloncancer cell line series.
 47. The isolated cell of claim 30, wherein saidnucleic acid comprises a nucleotide sequence that is at least 95%identical to the nucleotide sequence set forth in SEQ ID NO:
 5. 48. Theisolated cell of claim 30, wherein said cell is in a culture of cells.49. An isolated cell engineered to express a recombinant nucleic acidcomprising a nucleotide sequence that is at least 90% identical to thenucleotide sequence set forth in SEQ ID NO: 5.